There are many instances in programming where we need more than one variable in order to represent something. Suppose we need to write a program where we need to store information of students in a school. We might be interested in keeping track of attributes like student's name, age, class, roll no, birthday, etc.
If we were to use independent variables to track all of this information, that might look something like this:
std::string name;
int rollNo;
int age;
std::string class;
int birthYear;
int birthMonth;
int birthDay;
However, there are a number of problems with this approach:
- It's not immediately clear whether these variables are actually related or not (one just have to read comments, or debug the code to understand).
- There are now 7 variables to manage. If we wanted to pass this employee to a function, we would have to pass 7 arguments (and in correct order), which would make a mess of our function prototypes and function calls. and since a function can only return a single value, how would a function even return an student?
- And if we wanted more than one employee, we would need to define 7 more variables for each additional student (each of which would require a unique name).
What we really need is more way to organize all of these related pieces of data together, to make them easier to manage.
Fortunately, C++ comes with two compound types designed to solve such challenges:
- structs = A struct (short for structure) is a program-defined data type. That allows us to bundle multiple variables together into a single type.
- classes
Define structs
Because structs are a program-defined type, we first have to tell the compiler what our struct type looks like before we can begin using it. Here is an example of a struct definition for a simplified student:
struct Student
{
int id {};
int age {};
double marks {};
};
The struct
keyword is used to tell the compiler that we are defining a struct, which we have named Student
(since program-defined types are typically given names starting with a capital letter).
Then, inside a pair of curly braces, we define the variables that each Student object will contain. In this example, for each Student
we will have 3 variables: an int id
, an int age
, and a double marks
. The variables that are part of the struct are called data members (or member variables).
In C++, a member is a variable, function, or type that belongs to a struct (or class). All members must be declared within the struct (or class) definition.
Just like we use an empty set of curly braces to value initialize normal variables, the empty curly braces after each member variable ensures that the member variables inside our Student
are value initialized when an Student
is created.
As a reminder, Student
is just a type definition – no objects are actually created at this point.
Defining struct objects
In order to use the Student
type, we simply define a variable of type Student
:
Student dharma {}; // Employee is the type, dharma is the variable name
This defines a variable of type Student
named dharma
. When the code is executed, an Student object is instantiated that contains the 3 data members. The empty braces ensures our object is value-initialized.
Just like an other type, it is possible to define multiple of the same struct type:
Employee dharma {}; // create an Employee struct for Dharma
Employee sattu{}; // create an Employee struct for Sattu
Accessing members
Consider the following example:
struct Student
{
int id {};
int age {};
double marks {};
};
int main()
{
Student dharma {};
return 0;
}
In the above example, the name dharma
refers to the entire struct object (which contains the member variables). To access a specific member, we use the member selection operator (operator .
) in between the struct variable name and the member name. For example, to access dharma's age member, we would use dharma.age
.
Struct member variables work just like normal variables, so it is possible to do normal operations on them, including assignment, arithmetic, comparison, etc.
#include <iostream>
struct Student
{
int id {};
int age {};
double marks {};
};
int main()
{
Student dharma {};
dharma.age = 22; // use member selection operator (.) to select the age member of variable dharma
std::cout << dharma.age << '\n'; // print dharma's age
return 0;
}
This prints
22
One of the biggest advantages of structs is that we only need to create one new name per struct variable (the member names are fixed as part of the struct type definition). In the following example, we instantiate two Student
objects: dharma
and sattu
.
#include <iostream>
struct Student
{
int id {};
int age {};
double marks {};
};
int main()
{
Student dharma {};
dharma.id = 14;
dharma.age = 32;
dharma.marks = 75.5;
Student sattu {};
sattu.id = 15;
sattu.age = 28;
sattu.marks = 69.0;
int totalAge { dharma.age + sattu.age };
if (dharma.marks > sattu.marks)
std::cout << "dharma gets more than sattu\n";
else if (dharma.marks < sattu.marks)
std::cout << "dharma makes less than sattu\n";
else
std::cout << "dharma and sattu got the same marks\n";
// sattu got a promotion
sattu.marks += 2.5;
// Today is dharma's birthday
++dharma.age; // use pre-increment to increment dharma's age by 1
return 0;
}
// Output
dharma gets more than sattu
Data members are not initialized by default
Much like normal variables, data members are not initialized by default. Consider the following struct:
#include <iostream>
struct Student
{
int id; // note: no initializer here
int age;
double marks;
};
int main()
{
Student dharma; // note: no initializer here either
std::cout << dharma.id << '\n';
return 0;
}
Because we have not provided any initializers, when dharma
is instantiated, dharma.id
, dharma.age
, and dharma.marks
will all be uninitialized. We will then get undefined behavior when we try to print the value of dharma.id
.
What is an aggregate?
In general programming, an aggregate data type (also called an aggregate) is any type that can contain multiple data members. Some types of aggregates allow members to have different types (e.g. structs), while others require that all members must be of a single type (e.g. arrays).
Aggregate initialization of a struct
Because a normal variable can only hold a single value, we only need to provide a single initializer:
int x { 5 };
However, a struct can have multiple members:
struct Student
{
int id {};
int age {};
double marks {};
};
When we define an object with a struct type, we need some way to initialize multiple members at initialization time:
Student dharma; // how do we initialize dharma.id, dharma.age, and dharma.marks
Aggregates use a form of initialization called aggregate initialization, which allows us to directly initialize the members of aggregates. To do this, we provide an initializer list as an initializer, which is just a braced list of comma-separated values.
There are 2 primary forms of aggregate initialization:
struct Student
{
int id {};
int age {};
double marks {};
};
int main()
{
Student dharma = { 1, 32, 70.5 }; // copy-list initialization using braced list
Student sattu { 2, 28, 65.0 }; // list initialization using braced list (preferred)
return 0;
}
Each of these initialization forms does a member-wise initialization, which means each member in the struct is initialized in the order of declaration. Thus, Student sattu { 2, 28, 65.0 };
first initializes sattu.id
with value 2
, then sattu.age
with value 28
, and sattu.marks
with value 65.0
last.
In C++20, we can also initialize (some) aggregates using a parenthesized list of values:
Student dharma (3, 45, 55.5); // direct initialization using parenthesized list (C++20)
Best practice:
Prefer the (non-copy) braced list form when initializing aggregates.
Missing initializers in an initializer list
If an aggregate is initialized but the number of initialization value is fewer than the number of members, then all remaining members will be value-initialized.
struct Student
{
int id {};
int age {};
double wage {};
};
int main()
{
Student dharma { 2, 28 }; // dharma.marks will be value-initialized to 0.0
return 0;
}
In the above example, dharma.id
will be initialized with value 2
, dharma.age
will be initialized with value 28
, and because dharma.marks
wasn't given an explicit initializer, it will be initialized to 0.0
.
This means we can use an empty initialization list to value-initialize all members of the struct:
Student dharma {};// value-initialize all members
Const structs
Variables of a struct type can be const (or constexpr), and just like all const variables, they must be initialized.
struct Rectangle
{
double length {};
double width {};
};
int main()
{
const Rectangle unit { 1.0, 1.0 };
const Rectangle zero { }; // value-initialize all members
return 0;
}
Designated initializers C++20
When initializing a struct from a list of values, the initializers are applied to the members in order of declaration:
struct Foo
{
int a {};
int c {};
};
int main()
{
Foo f { 1, 3 }; // f.a = 1, f.c = 3
return 0;
}
Now consider what would happen if you were to update this struct definition to add a new member that is not the last member:
struct Foo
{
int a {};
int b {}; // just added
int c {};
};
int main()
{
Foo f { 1, 3 }; // now, f.a = 1, f.b = 3, f.c = 0
return 0;
}
Now all your initialization values have shifter, and worse, the compiler may not detect this as an error (after all, the syntax is still valid).
To help avoid this, C++ 20 adds a new way to initialize struct members called designated initializers. Designated initializers allow you to explicitly define which initialization values map to which members. The members can be initialized using list or copy initialization, and must be initialized in the same order in which they are declared in the struct, otherwise an error will result. Members are not designated an initializer will be value initialized.
struct Foo
{
int a{ };
int b{ };
int c{ };
};
int main()
{
Foo f1{ .a{ 1 }, .c{ 3 } }; // ok: f1.a = 1, f1.b = 0 (value initialized), f1.c = 3
Foo f2{ .a = 1, .c = 3 }; // ok: f2.a = 1, f2.b = 0 (value initialized), f2.c = 3
Foo f3{ .b{ 2 }, .a{ 1 } }; // error: initialization order does not match order of declaration in struct
return 0;
}
Assignment with an initializer list
As shown in the prior chapter, we can assign values to members of structs individually:
struct Employee
{
int id {};
int age {};
double wage {};
};
int main()
{
Employee joe { 1, 32, 60000.0 };
joe.age = 33; // Joe had a birthday
joe.wage = 66000.0; // and got a raise
return 0;
}
This is fine for single members, but not great when you want to update many members. Similar to initializing a struct with a initializer list, you can also assign values to struct using an initializer list (which does member-wise assignment).
struct Employee
{
int id {};
int age {};
double wage {};
};
int main()
{
Employee joe { 1, 32, 60000.0 };
joe = { joe.id, 33, 66000.0 }; // Joe had a birthday and got a raise
return 0;
}
Note that because we didn't want to change joe.id
, we needed to provide the current value for joe.id
in our list as a placeholder, so the member-wise assignment could assign joe.id
to joe.id
. This is a bit ugly.
Assignment with designated initializers C++20
Designated initializers can also be used in a list assignment:
struct Employee
{
int id {};
int age {};
double wage {};
};
int main()
{
Employee joe { 1, 32, 60000.0 };
joe = { .id = joe.id, .age = 33, .wage = 66000.0 }; // Joe had a birthday and got a raise
return 0;
}
Any members that aren't designated in such an assignment will be assigned the value that would be used for value initialization. If we had not have specified a designated initializer for joe.id
, joe.id
would have been assigned the value 0.
Initializing a struct with another struct of the same type
A struct may also be initialized using another struct of the same type:
#include <iostream>
struct Foo
{
int a{};
int b{};
int c{};
};
int main()
{
Foo foo { 1, 2, 3 };
Foo x = foo; // copy initialization
Foo y(foo); // direct initialization
Foo z {foo}; // list initialization
std::cout << x.a << ' ' << y.b << ' ' << z.c << '\n';
return 0;
}
The above prints:
1 2 3
Default member initialization
When we define a struct (or class) type, we can provide a default initialization value for each member as part of the type definition. This process is called non-static member initialization, and the initialization value is called a default member initializer.
struct Something
{
int x; // no initialization value (bad)
int y {}; // value-initialized by default
int z { 2 }; // explicit default value
};
int main()
{
Something s1; // s1.x is uninitialized, s1.y is 0, and s1.z is 2
return 0;
}
In the above definition of Something
, x
has no default value, y
is value-initialized by default, and z
has the default value 2
. These default member initialization values will be used if the user doesn't provide an explicit initialization value when instantiating an object of type Something
.
Our s1
object doesn't have an initializer, so the members of s1
are initialized to their default values. s1.x
has no default initializer, so it remains uninitialized. s1.y
is value initialized by default, so it gets value 0
. And s1.z
is initialized with the value 2
.
Note that even though we haven't provided an explicit initializer for s1.z
, it is initialized to a non-zero value because of the default member initializer provided.
Explicit initialization values takes precedence over default values
Explicit values in a list initializer always take precedence over default member initialization values.
struct Something
{
int x; // no default initialization value (bad)
int y {}; // value-initialized by default
int z { 2 }; // explicit default value
};
int main()
{
Something s2 { 5, 6, 7 }; // use explicit initializers for s2.x, s2.y, and s2.z (no default values are used)
return 0;
}
In this case, s2
has explicit initialization values for every members, so the default member initialization values are not used at all.
Passing and returning structs
Passing structs (by reference)
A big advantage of using structs over individual variable is that we can pass the entire struct to a function that needs to work with the members. Structs are generally passed by (const) reference to avoid making copies.
#include <iostream>
struct Employee
{
int id {};
int age {};
double wage {};
};
void printEmployee(const Employee& employee) // note pass by reference here
{
std::cout << "ID: " << employee.id << '\n';
std::cout << "Age: " << employee.age << '\n';
std::cout << "Wage: " << employee.wage << '\n';
}
int main()
{
Employee joe { 14, 32, 24.15 };
Employee frank { 15, 28, 18.27 };
// Print Joe's information
printEmployee(joe);
std::cout << '\n';
// Print Frank's information
printEmployee(frank);
return 0;
}
In the above example, we pass an entire Employee
to printEmployee()
(twice, once for joe
and once for frank
).
The above program outputs:
ID: 14
Age: 32
Wage: 24.15
ID: 15
Age: 28
Wage: 18.27
Because we are passing the entire struct object (rather than individual members), we only need one parameter no matter how many members the struct object has. And, in the future, if we ever decide to add new members to our Empoyee
struct, we will not have to change the function declaration or function call. The new member will automatically be included.
Returning structs
Consider the case where we have a function that needs to return a point in 3-dimensional cartesian space. Such a point has 3 attribute, a x-coordinate, a y-coordinate, and a z-coordinate. But functions can only return one value. So how do we return all 3 coordinates back to the user?
One common way is to return a struct:
#include <iostream>
struct Point3d
{
double x { 0.0 };
double y { 0.0 };
double z { 0.0 };
};
Point3d getZeroPoint()
{
// We can create a variable and return the variable (we'll improve this below)
Point3d temp { 0.0, 0.0, 0.0 };
return temp;
}
int main()
{
Point3d zero{ getZeroPoint() };
if (zero.x == 0.0 && zero.y == 0.0 && zero.z == 0.0)
std::cout << "The point is zero\n";
else
std::cout << "The point is not zero\n";
return 0;
}
This prints:
The point is zero
Structs are usually returned by value, so as not to return a dangling reference.
In the getZeroPoint()
function above, we create a new named object (temp
) just so we could return it:
Point3d getZeroPoint()
{
// We can create a variable and return the variable (we'll improve this below)
Point3d temp { 0.0, 0.0, 0.0 };
return temp;
}
We can make our function slightly better by returning a temporary (unnamed/anonymous) object instead:
Point3d getZeroPoint()
{
return Point3d { 0.0, 0.0, 0.0 }; // return an unnamed Point3d
}
Deducing the return type
In the case where the function has an explicit return type (e.g. Point3d
), we can even omit the type in the return statement:
Point3d getZeroPoint()
{
// We already specified the type at the function declaration
// so we don't need to do so here again
return { 0.0, 0.0, 0.0 }; // return an unnamed Point3d
}
Also note that since in this case we are returning all zero values, we can use empty braces to return a value-initialized Point3d:
Point3d getZeroPoint()
{
// We can use empty curly braces to value-initialize all members
return {};
}
Structs with program-defined members
In C++, structs (and classes) can have members that are other program-defined types. There are two ways to do this:
- First, we can define one program-defined type (in the global scope) and then use it as a member of another program-defined type:
#include <iostream>
struct Employee
{
int id {};
int age {};
double wage {};
};
struct Company
{
int numberOfEmployees {};
Employee CEO {}; // Employee is a struct within the Company struct
};
int main()
{
Company myCompany{ 7, { 1, 32, 55000.0 } }; // Nested initialization list to initialize Employee
std::cout << myCompany.CEO.wage << '\n'; // print the CEO's wage
return 0;
}
In the above case, we have defined an Employee
struct, and then used that as a member in a Company
struct. When we initialize our Company
, we can also initialize our Employee
by using a nested initialization list. And if we want to know what the CEO's salary was, we simply use the member selection operator twice: myCompany.CEO.wage;
- Second, types can also be nested inside other types, so if an Employee only existed as part of a Company, the Employee type could be nested inside the Company struct:
#include <iostream>
struct Company
{
struct Employee // accessed via Company::Employee
{
int id{};
int age{};
double wage{};
};
int numberOfEmployees{};
Employee CEO{}; // Employee is a struct within the Company struct
};
int main()
{
Company myCompany{ 7, { 1, 32, 55000.0 } }; // Nested initialization list to initialize Employee
std::cout << myCompany.CEO.wage << '\n'; // print the CEO's wage
return 0;
}
Struct size and data structure alignment
Typically, the size of a struct is the sum of the size of all its members, but not always.
Consider the following program:
#include <iostream>
struct Foo
{
short a {};
int b {};
double c {};
};
int main()
{
std::cout << "The size of short is " << sizeof(short) << " bytes\n";
std::cout << "The size of int is " << sizeof(int) << " bytes\n";
std::cout << "The size of double is " << sizeof(double) << " bytes\n";
std::cout << "The size of Foo is " << sizeof(Foo) << " bytes\n";
return 0;
}
On author's machine this printed:
The size of short is 2 bytes
The size of int is 4 bytes
The size of double is 8 bytes
The size of Foo is 16 bytes
Note that the size of short
+ int
+ double
is 14 bytes, but the size of Foo
is 16 bytes.
It turns out, we can only say that the size of a struct will be at least as large as the size of all the variables at contains. But it could be larger. For a performance reasons, the compiler will sometimes add gaps into structures (this is called padding).
Structure padding
Structure padding is a concept in computer programming and, more specifically, in languages like C and C++, where the memory allocated for a structure or class may include unused bytes between the members. This is done by the compiler for optimization purposes and to ensure proper alignment of the data with the structure.
The primary reasons for structure padding are:
Memory Alignment = refers to the requirement that data in memory should be stored at address that are multiple of their size. For example, on a 32-bit system, 4-byte data types (like int
) should ideally start at memory addresses that are multiple of 4.