References in C++

Introduction

References in C++ are powerful and versatile constructs that play a crucial role in manipulating data efficiently. They provide a way to create aliases for variables, enabling us to work with the same data under different names. In this chapter, we will delve into the concept of references in C++, exploring their syntax, usage and benefits.

Introduction to References

A reference in C++ is essentially an alias or alternative name for an existing variable. Unlike pointers, references cannot be null and must be initialized upon declaration. They are declared using the & symbol and are used to refer to an already existing variable. The syntax for declaring a reference is as follows:

int main() {
    int originalVariable = 42;
    int& referenceVariable = originalVariable;
    
    // Now, referenceVariable is an alias for originalVariable
    return 0;
}

In this example, referenceVariable is a reference to origianlVariable, meaning any changes made to one will directly affect the other.

Syntax and Declaration:

To declare a reference, the syntax is as follows:

dataType& referenceName = existingVariable;

Here, dataType is the type of the referenced variable, and referenceName is the name of the reference.

Initialization Of References

Much like constants, all references must be initialized.

int main()
{
    int& invalidRef;   // error: references must be initialized

    int x { 5 };
    int& ref { x }; // okay: reference to int is bound to int variable

    return 0;
}

When a reference is initialized with an object (or function), we say it is bound to that object (or function). The process by which such a reference is bound is called reference binding. The object (or function) being referenced is sometimes called the referent.

References must be bound to a modifiable LValue.

int main()
{
    int x { 5 };
    int& ref { x }; // valid: lvalue reference bound to a modifiable lvalue

    const int y { 5 };
    int& invalidRef { y };  // invalid: can't bind to a non-modifiable lvalue
    int& invalidRef2 { 0 }; // invalid: can't bind to an rvalue

    return 0;
}

Lvalue references can't be bound to non-modifiable lvalue or rvalues (otherwise you be able to change those values through the reference, which would be a violation of their const-ness)

The type of the reference must match the type of the referent (there are some exceptions to this rule):

int main()
{
    int x { 5 };
    int& ref { x }; // okay: reference to int is bound to int variable

    double y { 6.0 };
    int& invalidRef { y }; // invalid; reference to int cannot bind to double variable
    double& invalidRef2 { x }; // invalid: reference to double cannot bind to int variable

    return 0;
}

References can't be reseated (changed to refer to another object)

Once initialized, a reference in C++ cannot be reseated, meaning it cannot be changed to reference another object.

New C++ programmers often try to reseat a reference by using assignment to provide the reference with another variable to reference. This will compile and run – but not function as expected. Consider the following program:

#include <iostream>

int main()
{
    int x { 5 };
    int y { 6 };

    int& ref { x }; // ref is now an alias for x

    ref = y; // assigns 6 (the value of y) to x (the object being referenced by ref)
    // The above line does NOT change ref into a reference to variable y!

    std::cout << x << '\n'; // user is expecting this to print 5

    return 0;
}

Output = 6

Perhaps surprisingly, this prints 6

When a reference is evaluated in an expression, it resolves to the object it's referencing. So ref = y doesn't change ref to now reference y. Rather, because ref is an alias for x, the expression evaluates as if it was written x = y – since y evaluates to values 6, x is assigned the value 6.

References and referents have independent lifetimes

  • A reference can be destroyed before the object it is referencing.
  • The object being referenced can be destroyed before the reference.

When a reference is destroyed before the referent, the referent is not impacted. The following program demonstrates this:

#include <iostream>

int main()
{
    int x { 5 };

    {
        int& ref { x };   // ref is a reference to x
        std::cout << ref << '\n'; // prints value of ref (5)
    } // ref is destroyed here -- x is unaware of this

    std::cout << x << '\n'; // prints value of x (5)

    return 0;
} // x destroyed here

// Output = 5
			5

When ref dies, variable x carries on as normal, blissfully unaware that a reference to it has been destroyed.

Dangling references

When an object being referenced destroyed before a reference to it, the reference is left referencing an object that no longer exists. Such a reference is called a dangling reference. Accessing a dangling reference leads to undefined behavior.

int& getReference() {
    int x = 42;
    return x; // Returning a reference to a local variable
}

// This would lead to undefined behavior

Lvalue reference to const

By using the const keyword when declaring an lvalue reference, we tell an reference to treat the object it is referencing as const. Such a reference is called an reference to a const value. (const reference).

references to const can bind to non-modifiable values.

int main()
{
    const int x { 5 };    // x is a non-modifiable lvalue
    const int& ref { x }; // okay: ref is a an lvalue reference to a const value

    return 0;
}

Because references to const treat the object they are referencing as const, they can be used to access but not modify the value being referenced:

#include <iostream>

int main()
{
    const int x { 5 };    // x is a non-modifiable lvalue
    const int& ref { x }; // okay: ref is a an lvalue reference to a const value

    std::cout << ref << '\n'; // okay: we can access the const object
    ref = 6;                  // error: we can not modify an object through a const reference

    return 0;
}

Initializing an reference to const with a modifiable value

Reference to const can also bind to modifiable values. In such a case, the object being referenced is treated as const when accessed through the reference (even though the underlying object is non-const):

#include <iostream>

int main()
{
    int x { 5 };          // x is a modifiable lvalue
    const int& ref { x }; // okay: we can bind a const reference to a modifiable lvalue

    std::cout << ref << '\n'; // okay: we can access the object through our const reference
    ref = 7;                  // error: we can not modify an object through a const reference

    x = 6;                // okay: x is a modifiable lvalue, we can still modify it through the original identifier

    return 0;
}

In the above program, we bind const reference ref to modifiable value x. We can then use ref to access x, but ref is const, we can not modify the value of x through ref. However, we still can modify the value of x directly (using the identifier x).

Key Characteristics of References

  • Initialization: References must be initialized when declared, and once initialized, they cannot be reassigned to refer to another variable. This makes them safer and more straightforward than pointers.
  • No Null References: Unlike pointers, references cannot be null. They must always refer to a valid object or variable.
  • Syntax: The syntax for references uses the & symbol, but it is essential to distinguish between the declaration of a reference and the address-of operator used with pointers.

Passing by Reference

One of the most common use cases for references is in function parameters. Passing by reference allows a function to modify the original data directly, avoiding the overhead of copying large objects.

void modifyValue(int& value) {
    value *= 2;
}

int main() {
    int number = 5;
    modifyValue(number);
    // 'number' is now 10
    return 0;
}

When an argument passed to a function is copied into the function's parameter:

#include <iostream>

void printValue(int y)
{
    std::cout << y << '\n';
} // y is destroyed here

int main()
{
    int x { 2 };

    printValue(x); // x is passed by value (copied) into parameter y (inexpensive)

    return 0;
}

In the above program, when printValue(x) is called, the value of x (2) is copied into parameter y. Then, at the end of the function, object y is destroyed.

This means that when we called the function, we made a copy of our argument's value, only to use it briefly and then destroy it! Fortunately, because fundamental types are cheap to copy, there isn't a problem.

Some objects are expensive to copy

Most of the types provided by the standard library (such as std::string) are class types. Class types are usually expensive to copy. Whenever possible, we want to avoid making unnecessary copies of objects that are expensive to copy, especially when we will destroy those copies almost immediately.

Consider the following program illustrating this point:

#include <iostream>
#include <string>

void printValue(std::string y)
{
    std::cout << y << '\n';
} // y is destroyed here

int main()
{
    std::string x { "Hello, world!" }; // x is a std::string

    printValue(x); // x is passed by value (copied) into parameter y (expensive)

    return 0;
}

This prints

“Hello, world!”

While this program behaves like we expect, it's also inefficient. Identically to the prior example, when pirintValue() is called argument x copied into printValue() parameter y. However, in this example, the argument is a std::string instead of an int, and std::string is a class type that is expensive to copy. And this expensive copy is made every time printValue() is called.

Pass by reference

One way to avoid making an expensive copy of an argument when calling a function is to use pass by reference instead of pass by value. When using pass by reference, we declare a function parameter as a reference type (or const reference type) rather than as normal type. When this function is called, each reference parameter is bound to the appropriate argument. Because the reference acts as an alias for the argument, no copy of the argument is made.

Here's the same example as above, using pass by reference instead of pass by value:

#include <iostream>
#include <string>

void printValue(std::string& y) // type changed to std::string&
{
    std::cout << y << '\n';
} // y is destroyed here

int main()
{
    std::string x { "Hello, world!" };

    printValue(x); // x is now passed by reference into reference parameter y (inexpensive)

    return 0;
}

The program is identical to the prior one, except the type of parameter y has been changed from std::string to std::string&. Now, when printValue(x) is called, reference parameter y is bound to argument x. Binding a reference is always inexpensive, and no copy of x needs to be made. Because a reference acts as an alias for the object being referenced, when printValue() uses reference y, it's accessing the actual argument x (rather than a copy of x).

Pass by reference allows us to change the value of an argument

When an object is passed by value, the function parameter receives a copy of the argument. This means that any changes to the value of the parameter are made to the copy of the argument, not the argument itself:

#include <iostream>

void addOne(int y) // y is a copy of x
{
    ++y; // this modifies the copy of x, not the actual object x
}

int main()
{
    int x { 5 };

    std::cout << "value = " << x << '\n';

    addOne(x);

    std::cout << "value = " << x << '\n'; // x has not been modified

    return 0;
}

In the above program, because value parameter y is a copy of x, when we increment y, this only affects y. This program outputs:

value = 5
value = 5

However, since a reference acts identically to the object being referenced, when using pass by reference, any changes made to the reference parameter will affect the argument:

#include <iostream>

void addOne(int& y) // y is bound to the actual object x
{
    ++y; // this modifies the actual object x
}

int main()
{
    int x { 5 };

    std::cout << "value = " << x << '\n';

    addOne(x);

    std::cout << "value = " << x << '\n'; // x has been modified

    return 0;
}

// Output
value = 5
value = 6

In the above example, x initially has value 5. When addOne(x) is called, reference parameter y is bound to argument x. When the addOne() function increments reference y, it's actually incrementing argument x from 5 to 6 (not a copy of x). This changed value persists even after addOne() has finished executing.

Pass by reference can only accept modifiable value arguments

Because a reference to a non-const value can only bind to a modifiable value, this means that pass by reference only works with arguments that are modifiable values.

#include <iostream>

void printValue(int& y) // y only accepts modifiable lvalues
{
    std::cout << y << '\n';
}

int main()
{
    int x { 5 };
    printValue(x); // ok: x is a modifiable lvalue

    const int z { 5 };
    printValue(z); // error: z is a non-modifiable lvalue

    printValue(5); // error: 5 is an rvalue

    return 0;
}

Unlike a reference to non-const (which can only bind to modifiable values), a reference to const can bind to modifiable values, non-modifiable values, and values (rvalues). Therefore, if we make a reference parameter const, then it will be able to bind to any type of argument:

#include <iostream>

void printValue(const int& y) // y is now a const reference
{
    std::cout << y << '\n';
}

int main()
{
    int x { 5 };
    printValue(x); // ok: x is a modifiable lvalue

    const int z { 5 };
    printValue(z); // ok: z is a non-modifiable lvalue

    printValue(5); // ok: 5 is a literal rvalue

    return 0;
}

Passing by const reference offers the same primary benefit as pass by reference (avoiding making a copy of the argument), while also guaranteeing that the function can not change being referenced.

For example, the following is disallowed, because ref is const:

void addOne(const int& ref)
{
    ++ref; // not allowed: ref is const
}

In most cases, we don't want our functions modifying the value of arguments.

Favor passing by const reference over passing by non-const reference unless you have a specific reason to do otherwise (e.g., the function needs to change the value of an argument).

Now we can understand the motivation for allowing const value references to bind to rvalues: without that capability, there would be no way to pass literals to functions that used by pass by reference.

References vs Pointers

write a different chapter for it