Data Type
A data type in programming is a classification that specifies the type of data (value) that a variable can hold.
Data type specifies the size, range and type of data of a variable.
The main purpose of data type is to tell the compiler how to store and interpret data in memory. For example, when we declare a variable as an integer, we are basically telling the compiler that the data at the pointed memory location should be stored in 4 bytes (32 bits)
and should be interpreted as the same. Thus we know how much data to store and how much to read.
At lowest level, data is stored in sequence of 1's and 0's. As there is no way of telling where data of one variable ends and other one's start. So Data type facilitates us with this.
🤔 The Role of Data Types
- Defining Boundaries:
- Data types specify how many bytes of memory a particular variable will occupy. This helps the compiler determine where the data for one variable ends and the next one begins.
- Interpreting Bit Sequences:
- Data types provide a way to interpret sequences of bits in a meaningful way, translating binary data into human-readable values.
int number = 65;
- The integer value
65
is stored as the binary sequence0100 0001
. The data typeint
tells the compiler to interpret this bit sequence as a 32-bit integer.
- The integer value
For example, Consider the below memory layout:
As we all know that, memory is byte-addressable, means the minimum amount of memory that can be addressed is 1 byte (8 bits). In below screenshot it is one block.
Now we declare and initialize an variable var
of type int
.
int var = 7;
It is telling the compiler to store value 7
(0111
in binary) to memory pointed by the var
and its type is integer
, means this value is stored in 4 bytes in memory (4 blocks of 1 byte). Thus we get to know the ending location of var
variable which is start location plus 4 bytes.
As you can see in the diagram given above, the value is stored at memory location 0x0002
which is pointed by the var
variable. Since its type is int
, the compiler would store the value 7
into four consecutive blocks of memory starting at 0x0002
, and way of storing data in memory depends on architecture type, whether it is little-endian or big-endian. In our example the architecture is little-endian, thus storing the Least-significant byte at lower address in memory, thus the 7
is stored as the 00000000 00000000 00000000 00000111
at binary level.
When it comes to reading the value pointed by the var
variable then also data type comes in handy, allowing the compiler to know, how much memory (bytes) to read. Since the var
variable data type is int
, thus compiler know it has to read four bytes of memory (32 bits) starting from the address pointed by the var
variable which is 0x0002
. In this way compiler knows how much data to write at location and how much data to read.
Uses of Data Type
1 Memory Allocation:
Data types help the compiler allocate the correct amount of memory for a variable. Each data type has a specific size, ensuring that the right amount of memory is reserved for storing its value.
int age = 25; // Typically uses 4 bytes
char grade = 'A'; // Typically uses 1 byte
2 Data Integrity (Type Safety):
Data types ensure that only valid data is stored in a variable. This helps maintain the integrity and consistency of data throughout the program.
int age = 25.5; // Compiler error or warning, as age should be an integer
// It should be explicitly casted if it is necessary.
// As converting between types can cause data loss (e.g., decimal truncation).
3 Correct Operations:
Data types ensure that the operations performed on variables are appropriate and meaningful. This helps prevent errors and ensures the program behaves as expected.
int a = 10, b = 5;
int sum = a + b; // Arithmetic operation
bool isEqual = (a == b); // Comparison operation
Categories of Data Types
Data Types can be categorized broadly into the following categories:
🥇 Fundamental (Basic | Built-in | Primary | Primitive) Data Types
These are the basic data types provided by C++.
By default integer types are signed, meaning they can represent both negative and positive values. Unsigned integers can only represent non-negative values, allowing them to store a large positive range.
Signed integers, are typically represented using two's complement
notation. The most significant bit is used for the sign, and the remaining bits are used for the magnitude.
Two's Complement
:
- The most significant bit (MSB) is the sign bit:
- If the sign bit is
0
, the number is non-negative. - If the sign bit is
1
, the number is negative.
- If the sign bit is
- The value of a signed integer is determined as follows:
- For a positive number or zero, the value is straightforward, just as in unsigned integers.
- For a negative number, invert all the bits and add one to the result to get the magnitude.
1️⃣ Integer Types:
char
: (Short for Character)
The char data type is used to store a single character (such as letters, digits, punctuation, and special symbols). Internally, a char
is represented by an integer that corresponds to a character's ASCII (American Standard Code for Information Interchange) value.
ASCII and Unicode:
- ASCII: In ASCII (American Standard Code for Information Interchange), characters are represented using 7 bits. The
char
data type is capable of representing these ASCII characters directly. - Unicode: For representing a wider range of characters beyond ASCII (like international characters), you might use
wchar_t
,char16_t
, orchar32_t
depending on the encoding (UTF-16, UTF-32).
Size of char
:
The size of char
data type is 1 byte (8 bits)
in most systems, as specified by the C++ standard. This size is constant and does not change regardless of the system architecture (whether it is 32-bit or 64-bit).
- Size: 1 byte = 8 bits.
- It means it can hold
2^n
,2^8 = 256
different values.
- It means it can hold
Range of char
:
The range of a char
depends on whether it is signed or unsigned.
The C++ standard allows for two types of char
data types:
char
: This is the default character type, which can be either signed or unsigned depending on the compiler. In g++ it issigned
by default.signed char
: Explicitly signed, which means it can represent negative values.unsigned char
: Explicitly unsigned, which means it can only represent non-negative values.
(Ⅰ) Signed char
:
char is signed
by default until we define unsigned
explicitly.
In systems where char
is treated as a signed data type, the range is from -128
to 127
. The most significant bit (MSB) is reserved for the sign (0 for positive, 1 for negative), leaving 7 bits for the magnitude of the value.
- A signed
char
uses 1 byte (8 bits), and the MSB is used for the sign.- If the MSB is 0, the number is positive.
- If the MSB is 1, the number is negative.
- The remaining 7 bits are used for the magnitude (the value of the number).
Range can be calculated using the below formula:
Range = -2^(n-1) to (2^(n-1)-1)
= -2^(8-1) to (2^(8-1)-1)
= -2^7 to (2^7-1)
= -128 to 127
Positive Range:
- When the MSB is
0
, the remaining7
bits represent positive numbers, giving a maximum value of:(2^7)
= 128- These 128 values are split into:
- One value for 0 (We need to account for the fact that
0
is also part of the positive range.) - Positive values from
1
to127
.
- One value for 0 (We need to account for the fact that
- Thus the highest value in the positive range is
127
, which is2^7 - 1
.
0 1 2 3 4 ... 125 126 127
Negative Range:
- When the MSB is
1
, the remaining7
bits represent negative numbers, giving a maximum value:(2^7)
= 128, distinct values that can be represented by the remaining 7 bits.
-128 -127 -126 -125 -124 ... -3 -2 -1
(Ⅱ) Unsigned char
:
When char
is used as unsigned char, all 8 bits are used to represent the value. As with the formula 2^n
, it can hold 2^8
different values which is its range from 0
to 255
. From 0
to 255
sums up to 256
.
Let's calculate its range:
Range of unsigned = 0 to (2^n - 1)
= 0 to (2^8 - 1)
= 0 to (256 - 1)
= 0 to 255
8 Bit Sign-Magnitude:
Binary Value | Sign-Magnitude Interpretation | Unsigned Interpretation |
---|---|---|
00000000 | +0 | 0 |
00000001 | +1 | 1 |
⋮ | ⋮ | ⋮ |
01111101 | +125 | 125 |
01111110 | +126 | 126 |
01111111 | +127 | 127 |
10000000 | -0 | 128 |
10000001 | -1 | 129 |
10000010 | -2 | 130 |
⋮ | ⋮ | ⋮ |
11111101 | -125 | 253 |
11111110 | -126 | 254 |
11111111 | -127 | 255 |
short
:
- Definition:
short
is a data type in C++ that stores integer values. It is often used when you need a smaller range of integer values to save memory compared to theint
type. - Purpose: To provide a more memory-efficient integer representation for scenarios where large ranges of values are not necessary.
Size:
- The size of
short
is implementation-dependent but is typically 2 bytes (16 bits) on most systems. The C++ standard guarantees thatshort
will be at least 16 bits, but it could be larger on some systems. - Since it uses 16 bits for representation. Thus is can store
2^16
different values which are65536
. - We can get the size by using the formula
size = 2^n
= 2^16
= 65536
Range:
As we got to know that the size of the short
is 16 bits (2 bytes) which means it can store 2^16 = 65536
different values. However short
could be signed
or unsigned
.
(Ⅰ) Signed short
:
It can store both positives and negatives values.
In it the MSB
(Most Significant Bit) is reserved for the sign of the value. If 0
means the value is positive, else 1
means the value is negative. Thus we have left with 15
bits for the magnitude of the value.
Thus it can store 2 ^ 15 = 32768
different values. Means signed short
's can store 32768
different positive/negative values.
However this is divided into two range:
Positive Range:
- When the MSB is
0
, the remaining15
bits represent positive numbers, giving a maximum value of:(2^15)
=32768
- These
32768
values are split into:- One value for 0 (We need to account for the fact that
0
is also part of the positive range.) - Positive values from
1
to32767
.
- One value for 0 (We need to account for the fact that
- Thus the highest value in the positive range is from
0
to32767
, which are32768
distinct values.
Negative Range:
- When the MSB is
1
, the remaining15
bits represent negative numbers, giving a maximum value:(2^15)
=32768
, distinct values that can be represented by the remaining15
bits.- Negative range is from
-1 to -32768
. - Note:
0
is considered to be positive.
-32,768 <-------> -1 | 0 <-----------> 32,767
Negative Range Positive Range
Together = Negative Range + Positive Range
= 65536
(Ⅱ) Unsigned Short
:
An unsigned short
uses all 16 bits for magnitude (no sign bit), meaning it can only represent non-negative values.
It can store 2^16 = 65536
distinct values.
which starts from 0
and ends to 65535
.
We can find the unsigned range
using the below formula:
unsigned range = 0 to (2^n) - 1
= 0 to (2^16) - 1
= 0 to (65536 - 1)
= 0 to 65535
Minimum value = 0
Maximum value = 65535
int
:
- Definition: The
int
data type is a fundamental data type in C++ that stores integer values (positive and negative whole numbers, including zero). - Purpose: It is used when you need to work with integer numbers in your program, and it is the most efficient type for storing integer values on most platforms.
Size:
The size of int
is platform-dependent.
- Typically, on modern 32-bit or 64-bit systems,
int
is 4 bytes (32 bits). - The C++ standard guarantees that
int
will be at least 2 bytes (16 bits), but it can vary depending on the architecture.
Platform | Size of int |
---|---|
16-bit | 2 bytes (16 bits) |
32-bit | 4 bytes (32 bits) |
64-bit | 4 bytes (32 bits) |
For 32-bit system, Since it is of 4 bytes (32-bits). Thus it can store 2^32 = 4,294,967,296
different values.
Similar to other data types it also could be signed
or unsigned
.
(Ⅰ) Signed int
:
- A signed
int
can represent both negative and positive numbers, including zero. - By default, when you declare an
int
, it is treated as a signed integer.
Size of signed int
:
- Since the
MSB
bit is reserved for the sign and we are left with31
bits for the magnitude of the value. - Thus the size of the
signed int
is as follows:
size of signed int = 2 ^ (n-1)
= 2 ^ 31
= 2,147,483,648
Range of signed int
:
- Since the
MSB
bit is used to specify thesign
of the value. Thus we are left with31
bits for the magnitude of the numbers. - This size of the magnitude is divided into two ranges positive and negative.
Positive Range:
- Since we are left with
31
bits for the magnitude of the value. - We have the size of
2^31 = 2,147,483,648
, which means it can store this much distinct values. - The positive range starts from
0 to 2^(n-1) - 1
. - Note:
n
here in the formula is32
.
Possible distinct values = 2^31
= 2,147,483,648
Minimum Positive Range = 0
Maximum Positive Range = (2^31) - 1
= 2,147,483,648 - 1
= 2,147,483,647
Note: The subtraction 1
from maximum positive range is because of 0 is also considered to be the positive.
Negative Range:
- The negative range starts from
-1, to -2^(n-1)
.
Possible distinct values = 2^(n-1)
= 2^(32-1)
= 2^31
= 2,147,483,648
Minimum Negative Range = -1
Maximum Negative Range = -(2^(32-1))
= -(2^31)
= -2,147,483,648
Signed int (32-bit) | Range |
---|---|
Negative Range | -2,147,483,648 to -1 |
Positive Range | 0 to 2,147,483,647 |
(Ⅱ) Unsigned int
:
The unsigned int
can only represent positive values (zero and positive values). There is no reserved bit for the sign. Thus all 32
bits can be used for the magnitude. You can explicitly declare an unsigned integer using the unsigned
keyword.
The size of unsigned int
is as follows:
size of unsigned int = 2^n
= 2^32
= 4,294,967,296
Range of unsigned int
:
- As
0
is the first positive number. - Then range for this is as follows:
Possible distinct values = 2^32
= 4,294,967,296
Minimum Negative Range = 0
Maximum Negative Range = (2^n)-1
= (2^32)-1
= 4,294,967,296 - 1
= 4,294,967,295
Type | Can Represent | Range (for 32-bit) |
---|---|---|
Unsigned int | Zero, positive | 0 to 4,294,967,295 |
long
:
- At least 4 bytes, but often 8 bytes on many modern systems.
- Can be signed or unsigned.
- Range of long:
- signed long: -2,147,483,648 to 2,147,483,647
- unsigned long: 0 to 4,294,967,295 (or larger on some systems).
- Range of long:
- Used for larger integers.
long long
:
- At least 8 bytes.
- Can be signed or unsigned.
- Range of long long:
- signed long long: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
- unsigned long long: 0 to 18,446,744,073,709,551,615.
- Range of long long:
- Used for very large integers.
2️⃣ Floating-Point Types:
3️⃣ Boolean Type:
It is native to C++.
- Size:
- The size of a
bool
can vary depending on the compiler and platform, but it is typically 1 byte. - The C++ standard does not specify the exact size, but it guarantees that
bool
is large enough to hold at least the valuestrue
andfalse
.
- The size of a
- Values:
true
: Represents a logical true value.false
: Represents a logical false value.
- From Other Types:
- Non-zero integer values are converted to
true
. - Zero is converted to
false
. - For example, in expressions like
(5)
or(0)
, the result istrue
andfalse
respectively.
- Non-zero integer values are converted to
4️⃣ Void Type:
It is a special type that represents the absence of a type. It is used in several contexts:
- Void Functions
- When a function does not return a value, it is declared with the
void
return type: void printMessage() { std::cout << "Hello, World!" << std::endl; }
- When a function does not return a value, it is declared with the
- Void Pointers
- A
void
pointer, orvoid*
, is a pointer that can point to any data type. It is used when the type of the data being pointed to is not known or is irrelevant to the function being written. It is often used for generic programming and functions that handle data of unknown type. However, before dereferencing avoid*
, it must be cast to another pointer type. void* ptr; int nume = 10; ptr = # // void pointer pointing to an int // To use the pointer, it needs to be cast to the correct type int* intPtr = static_cast<int*>(ptr); std::cout << *intPtr << std::endl; // outputs 10
- A
🥈 Derived Data Types
These types are derived from the fundamental data types:
1️⃣ Arrays
An array is a collection of elements of the same type stored in contiguous memory locations. Arrays allow you to store multiple values of the same type together.
int numbers[5] = {1, 2, 3, 4, 5};
char letters[3] = {'a', 'b', 'c'};
- Accessing Elements: You can access array elements using indices, starting from 0.
int firstNumber = numbers[0]; // Access the first element
letters[1] = 'z'; // Modify the second element
2️⃣ Pointers
A pointer is a variable that stores the memory address of another variable. Pointers are used for dynamic memory allocation, passing variables by reference, and working with arrays and functions.
int var = 10;
int* ptr = &var; // Pointer to an integer, storing the address of var
- Dereferencing Pointers: You can access the value stored at the pointer's address using the dereference operator
*
. int value = *ptr; // Dereference the pointer to get the value of var
- Null Pointer: A pointer that is not assigned any address has a null value.
int* nullPtr = nullptr;
3️⃣ References
A reference is an alias for another variable. Once a reference is initialized to a variable, it cannot be changed to refer to another variable. References are used for passing arguments to functions by reference.
int num = 10;
int& ref = num; // Reference to the variable num
- Using References: You can use a reference like the original variable.
ref = 20; // This will change the value of num to 20
4️⃣ Function Types
Functions in C++ have types determined by their return type and parameter list. Function pointers can store the address of a function and be used to call the function indirectly.
- Function Declaration:
int add(int a, int b) { return a + b; }
- Function Pointer:
int (*funcPtr)(int, int) = &add; int result = funcPtr(2, 3); // Calls the add function through the function pointer
🥉 User-Defined Data Types
These types are defined by the user to create more complex data structures:
1️⃣ Classes
Classes are blueprints for creating the objects.
- By default, the members of a class are private, whereas in a struct, they are public.
#include <iostream>
using namespace std;
class Car {
private:
string brand;
int year;
public:
void setBrand(string b) {
brand = b;
}
void setYear(int y) {
year = y;
}
void display() {
cout << "Brand: " << brand << endl;
cout << "Year: " << year << endl;
}
};
int main() {
Car car1;
car1.setBrand("Toyota");
car1.setYear(2020);
car1.display();
return 0;
}
2️⃣ Structures
A structure is a collection of variables of different types under a single name. Structures are used to represent a record.
Similar to classes but with public members by default.
#include <iostream>
using namespace std;
struct Person {
string name;
int age;
};
int main() {
Person p1;
p1.name = "John";
p1.age = 30;
cout << "Name: " << p1.name << endl;
cout << "Age: " << p1.age << endl;
return 0;
}
3️⃣ Unions
A union is a special data type that allows storing different data types in the same memory location. Only one of the members can be used at a time, and the memory occupied by a union is equal to the memory required by its largest data member.
#include <iostream>
using namespace std;
union Data {
int intValue;
float floatValue;
char charValue;
};
int main() {
Data data;
data.intValue = 10;
cout << "Integer: " << data.intValue << endl;
data.floatValue = 3.14;
cout << "Float: " << data.floatValue << endl;
data.charValue = 'A';
cout << "Character: " << data.charValue << endl;
return 0;
}
4️⃣ Enumerations
An enumeration is a user-defined data type that consists of integral constants. Each integral constant is given a name, which makes the code more readable and maintainable.
#include <iostream>
using namespace std;
enum Color {
RED,
GREEN,
BLUE
};
int main() {
Color c = GREEN;
if (c == GREEN) {
cout << "The color is green." << endl;
}
return 0;
}
5️⃣ Typedef
Used to give a new name to an existing data type.
typedef unsigned int uint;