Data Types in C++

Data Type

A data type in programming is a classification that specifies the type of data (value) that a variable can hold.

Data type specifies the size, range and type of data of a variable.

The main purpose of data type is to tell the compiler how to store and interpret data in memory. For example, when we declare a variable as an integer, we are basically telling the compiler that the data at the pointed memory location should be stored in 4 bytes (32 bits) and should be interpreted as the same. Thus we know how much data to store and how much to read.

At lowest level, data is stored in sequence of 1's and 0's. As there is no way of telling where data of one variable ends and other one's start. So Data type facilitates us with this.

🤔 The Role of Data Types

  • Defining Boundaries:
    • Data types specify how many bytes of memory a particular variable will occupy. This helps the compiler determine where the data for one variable ends and the next one begins.
  • Interpreting Bit Sequences:
    • Data types provide a way to interpret sequences of bits in a meaningful way, translating binary data into human-readable values.
    • int number = 65;
      
      • The integer value 65 is stored as the binary sequence 0100 0001. The data type int tells the compiler to interpret this bit sequence as a 32-bit integer.

For example, Consider the below memory layout:

As we all know that, memory is byte-addressable, means the minimum amount of memory that can be addressed is 1 byte (8 bits). In below screenshot it is one block.

image-212.png

Now we declare and initialize an variable var of type int.

int var = 7;

It is telling the compiler to store value 7(0111 in binary) to memory pointed by the var and its type is integer, means this value is stored in 4 bytes in memory (4 blocks of 1 byte). Thus we get to know the ending location of var variable which is start location plus 4 bytes.

image-215.png

As you can see in the diagram given above, the value is stored at memory location 0x0002 which is pointed by the var variable. Since its type is int, the compiler would store the value 7 into four consecutive blocks of memory starting at 0x0002, and way of storing data in memory depends on architecture type, whether it is little-endian or big-endian. In our example the architecture is little-endian, thus storing the Least-significant byte at lower address in memory, thus the 7 is stored as the 00000000 00000000 00000000 00000111 at binary level.

When it comes to reading the value pointed by the var variable then also data type comes in handy, allowing the compiler to know, how much memory (bytes) to read. Since the var variable data type is int, thus compiler know it has to read four bytes of memory (32 bits) starting from the address pointed by the var variable which is 0x0002. In this way compiler knows how much data to write at location and how much data to read.

Uses of Data Type

1 Memory Allocation:

Data types help the compiler allocate the correct amount of memory for a variable. Each data type has a specific size, ensuring that the right amount of memory is reserved for storing its value.

int age = 25; // Typically uses 4 bytes
char grade = 'A'; // Typically uses 1 byte

2 Data Integrity (Type Safety):

Data types ensure that only valid data is stored in a variable. This helps maintain the integrity and consistency of data throughout the program.

int age = 25.5; // Compiler error or warning, as age should be an integer

// It should be explicitly casted if it is necessary.
// As converting between types can cause data loss (e.g., decimal truncation).

3 Correct Operations:

Data types ensure that the operations performed on variables are appropriate and meaningful. This helps prevent errors and ensures the program behaves as expected.

int a = 10, b = 5;
int sum = a + b; // Arithmetic operation
bool isEqual = (a == b); // Comparison operation

Categories of Data Types

Data Types can be categorized broadly into the following categories:

🥇 Fundamental (Basic | Built-in | Primary | Primitive) Data Types

These are the basic data types provided by C++.

By default integer types are signed, meaning they can represent both negative and positive values. Unsigned integers can only represent non-negative values, allowing them to store a large positive range.

Signed integers, are typically represented using two's complement notation. The most significant bit is used for the sign, and the remaining bits are used for the magnitude.

Two's Complement:

  • The most significant bit (MSB) is the sign bit:
    • If the sign bit is 0, the number is non-negative.
    • If the sign bit is 1, the number is negative.
  • The value of a signed integer is determined as follows:
    • For a positive number or zero, the value is straightforward, just as in unsigned integers.
    • For a negative number, invert all the bits and add one to the result to get the magnitude.

1️⃣ Integer Types:

char: (Short for Character)

The char data type is used to store a single character (such as letters, digits, punctuation, and special symbols). Internally, a char is represented by an integer that corresponds to a character's ASCII (American Standard Code for Information Interchange) value.

ASCII and Unicode:
  • ASCII: In ASCII (American Standard Code for Information Interchange), characters are represented using 7 bits. The char data type is capable of representing these ASCII characters directly.
  • Unicode: For representing a wider range of characters beyond ASCII (like international characters), you might use wchar_t, char16_t, or char32_t depending on the encoding (UTF-16, UTF-32).
Size of char:

The size of char data type is 1 byte (8 bits) in most systems, as specified by the C++ standard. This size is constant and does not change regardless of the system architecture (whether it is 32-bit or 64-bit).

  • Size: 1 byte = 8 bits.
    • It means it can hold 2^n, 2^8 = 256 different values.
Range of char:

The range of a char depends on whether it is signed or unsigned.

The C++ standard allows for two types of char data types:

  • char: This is the default character type, which can be either signed or unsigned depending on the compiler. In g++ it is signed by default.
  • signed char: Explicitly signed, which means it can represent negative values.
  • unsigned char: Explicitly unsigned, which means it can only represent non-negative values.

(Ⅰ) Signed char:

char is signed by default until we define unsigned explicitly.

In systems where char is treated as a signed data type, the range is from -128 to 127. The most significant bit (MSB) is reserved for the sign (0 for positive, 1 for negative), leaving 7 bits for the magnitude of the value.

  • A signed char uses 1 byte (8 bits), and the MSB is used for the sign.
    • If the MSB is 0, the number is positive.
    • If the MSB is 1, the number is negative.
  • The remaining 7 bits are used for the magnitude (the value of the number).

Range can be calculated using the below formula:

Range = -2^(n-1) to (2^(n-1)-1)
      = -2^(8-1) to (2^(8-1)-1)
      = -2^7 to (2^7-1)
      = -128 to 127

Positive Range:

  • When the MSB is 0, the remaining 7 bits represent positive numbers, giving a maximum value of:
    • (2^7) = 128
    • These 128 values are split into:
      • One value for 0 (We need to account for the fact that 0 is also part of the positive range.)
      • Positive values from 1 to 127.
    • Thus the highest value in the positive range is 127, which is 2^7 - 1.
  • 0     1     2     3     4     ...   125   126   127
    

Negative Range:

  • When the MSB is 1, the remaining 7 bits represent negative numbers, giving a maximum value:
    • (2^7) = 128, distinct values that can be represented by the remaining 7 bits.
  • -128  -127  -126  -125  -124  ...  -3  -2  -1
    

(Ⅱ) Unsigned char:

When char is used as unsigned char, all 8 bits are used to represent the value. As with the formula 2^n, it can hold 2^8 different values which is its range from 0 to 255. From 0 to 255 sums up to 256.

Let's calculate its range:

Range of unsigned = 0 to (2^n - 1)
                  = 0 to (2^8 - 1)
                  = 0 to (256 - 1)
                  = 0 to 255
8 Bit Sign-Magnitude:
Binary ValueSign-Magnitude InterpretationUnsigned Interpretation
00000000+00
00000001+11
01111101+125125
01111110+126126
01111111+127127
10000000-0128
10000001-1129
10000010-2130
11111101-125253
11111110-126254
11111111-127255

short:

  • Definition: short is a data type in C++ that stores integer values. It is often used when you need a smaller range of integer values to save memory compared to the int type.
  • Purpose: To provide a more memory-efficient integer representation for scenarios where large ranges of values are not necessary.
Size:
  • The size of short is implementation-dependent but is typically 2 bytes (16 bits) on most systems. The C++ standard guarantees that short will be at least 16 bits, but it could be larger on some systems.
  • Since it uses 16 bits for representation. Thus is can store 2^16 different values which are 65536.
  • We can get the size by using the formula
size = 2^n
     = 2^16
     = 65536
Range:

As we got to know that the size of the short is 16 bits (2 bytes) which means it can store 2^16 = 65536 different values. However short could be signed or unsigned.

(Ⅰ) Signed short:

It can store both positives and negatives values.

In it the MSB (Most Significant Bit) is reserved for the sign of the value. If 0 means the value is positive, else 1 means the value is negative. Thus we have left with 15 bits for the magnitude of the value.

Thus it can store 2 ^ 15 = 32768 different values. Means signed short's can store 32768 different positive/negative values.

However this is divided into two range:

Positive Range:

  • When the MSB is 0, the remaining 15 bits represent positive numbers, giving a maximum value of:
    • (2^15) = 32768
    • These 32768 values are split into:
      • One value for 0 (We need to account for the fact that 0 is also part of the positive range.)
      • Positive values from 1 to 32767.
    • Thus the highest value in the positive range is from 0 to 32767, which are 32768 distinct values.

Negative Range:

  • When the MSB is 1, the remaining 15 bits represent negative numbers, giving a maximum value:
    • (2^15) = 32768, distinct values that can be represented by the remaining 15 bits.
    • Negative range is from -1 to -32768.
    • Note: 0 is considered to be positive.
-32,768  <------->  -1  |  0  <----------->  32,767
      Negative Range             Positive Range

Together = Negative Range + Positive Range
         = 65536

(Ⅱ) Unsigned Short:

An unsigned short uses all 16 bits for magnitude (no sign bit), meaning it can only represent non-negative values.

It can store 2^16 = 65536 distinct values.

which starts from 0 and ends to 65535.

We can find the unsigned range using the below formula:

unsigned range = 0 to (2^n) - 1
               = 0 to (2^16) - 1
               = 0 to (65536 - 1)
               = 0 to 65535
 Minimum value = 0
 Maximum value = 65535

 

int:

  • Definition: The int data type is a fundamental data type in C++ that stores integer values (positive and negative whole numbers, including zero).
  • Purpose: It is used when you need to work with integer numbers in your program, and it is the most efficient type for storing integer values on most platforms.
Size:

The size of int is platform-dependent.

  • Typically, on modern 32-bit or 64-bit systems, int is 4 bytes (32 bits).
  • The C++ standard guarantees that int will be at least 2 bytes (16 bits), but it can vary depending on the architecture.
PlatformSize of int
16-bit2 bytes (16 bits)
32-bit4 bytes (32 bits)
64-bit4 bytes (32 bits)

For 32-bit system, Since it is of 4 bytes (32-bits). Thus it can store 2^32 = 4,294,967,296 different values.

 

Similar to other data types it also could be signed or unsigned.

(Ⅰ) Signed int:

  • A signed int can represent both negative and positive numbers, including zero.
  • By default, when you declare an int, it is treated as a signed integer.

Size of signed int:

  • Since the MSB bit is reserved for the sign and we are left with 31 bits for the magnitude of the value.
  • Thus the size of the signed int is as follows:
size of signed int = 2 ^ (n-1)
                   = 2 ^ 31
                   = 2,147,483,648

Range of signed int:

  • Since the MSB bit is used to specify the sign of the value. Thus we are left with 31 bits for the magnitude of the numbers.
  • This size of the magnitude is divided into two ranges positive and negative.

Positive Range:

  • Since we are left with 31 bits for the magnitude of the value.
  • We have the size of 2^31 = 2,147,483,648, which means it can store this much distinct values.
  • The positive range starts from 0 to 2^(n-1) - 1.
  • Note: n here in the formula is 32.
Possible distinct values = 2^31
                         = 2,147,483,648
Minimum Positive Range = 0
Maximum Positive Range = (2^31) - 1
                       = 2,147,483,648 - 1
                       = 2,147,483,647

Note: The subtraction 1 from maximum positive range is because of 0 is also considered to be the positive.

Negative Range:

  • The negative range starts from -1, to -2^(n-1).
Possible distinct values = 2^(n-1)
                         = 2^(32-1)
                         = 2^31
                         = 2,147,483,648
Minimum Negative Range = -1
Maximum Negative Range = -(2^(32-1))
                       = -(2^31)
                       = -2,147,483,648
Signed int (32-bit)Range
Negative Range-2,147,483,648 to -1
Positive Range0 to 2,147,483,647

(Ⅱ) Unsigned int:

The unsigned int can only represent positive values (zero and positive values). There is no reserved bit for the sign. Thus all 32 bits can be used for the magnitude. You can explicitly declare an unsigned integer using the unsigned keyword.

The size of unsigned int is as follows:

size of unsigned int = 2^n
                     = 2^32
                     = 4,294,967,296

Range of unsigned int:

  • As 0 is the first positive number.
  • Then range for this is as follows:
Possible distinct values = 2^32
                         = 4,294,967,296
Minimum Negative Range = 0
Maximum Negative Range = (2^n)-1
                       = (2^32)-1
                       = 4,294,967,296 - 1
                       = 4,294,967,295
TypeCan RepresentRange (for 32-bit)
Unsigned intZero, positive0 to 4,294,967,295

long:

  • At least 4 bytes, but often 8 bytes on many modern systems.
  • Can be signed or unsigned.
    • Range of long:
      • signed long: -2,147,483,648 to 2,147,483,647
      • unsigned long: 0 to 4,294,967,295 (or larger on some systems).
  • Used for larger integers.

long long:

  • At least 8 bytes.
  • Can be signed or unsigned.
    • Range of long long:
      • signed long long: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
      • unsigned long long: 0 to 18,446,744,073,709,551,615.
  • Used for very large integers.

2️⃣ Floating-Point Types:

 

3️⃣ Boolean Type:

It is native to C++.

  • Size:
    • The size of a bool can vary depending on the compiler and platform, but it is typically 1 byte.
    • The C++ standard does not specify the exact size, but it guarantees that bool is large enough to hold at least the values true and false.
  • Values:
    • true: Represents a logical true value.
    • false: Represents a logical false value.
  • From Other Types:
    • Non-zero integer values are converted to true.
    • Zero is converted to false.
    • For example, in expressions like (5) or (0), the result is true and false respectively.

4️⃣ Void Type:

It is a special type that represents the absence of a type. It is used in several contexts:

  1. Void Functions
    1. When a function does not return a value, it is declared with the void return type:
    2. void printMessage() {
          std::cout << "Hello, World!" << std::endl;
      }
      
  2. Void Pointers
    1. A void pointer, or void*, is a pointer that can point to any data type. It is used when the type of the data being pointed to is not known or is irrelevant to the function being written. It is often used for generic programming and functions that handle data of unknown type. However, before dereferencing a void*, it must be cast to another pointer type.
    2. void* ptr;
      int nume = 10;
      ptr = &num; // void pointer pointing to an int
      
      // To use the pointer, it needs to be cast to the correct type
      int* intPtr = static_cast<int*>(ptr);
      std::cout << *intPtr << std::endl; // outputs 10

🥈 Derived Data Types

These types are derived from the fundamental data types:

1️⃣ Arrays

An array is a collection of elements of the same type stored in contiguous memory locations. Arrays allow you to store multiple values of the same type together.

int numbers[5] = {1, 2, 3, 4, 5};
char letters[3] = {'a', 'b', 'c'};
  • Accessing Elements: You can access array elements using indices, starting from 0.
int firstNumber = numbers[0]; // Access the first element
letters[1] = 'z'; // Modify the second element

2️⃣ Pointers

A pointer is a variable that stores the memory address of another variable. Pointers are used for dynamic memory allocation, passing variables by reference, and working with arrays and functions.

int var = 10;
int* ptr = &var; // Pointer to an integer, storing the address of var
  • Dereferencing Pointers: You can access the value stored at the pointer's address using the dereference operator *.
  • int value = *ptr; // Dereference the pointer to get the value of var
  • Null Pointer: A pointer that is not assigned any address has a null value.
  • int* nullPtr = nullptr;
    

3️⃣ References

A reference is an alias for another variable. Once a reference is initialized to a variable, it cannot be changed to refer to another variable. References are used for passing arguments to functions by reference.

int num = 10;
int& ref = num; // Reference to the variable num
  • Using References: You can use a reference like the original variable.
  • ref = 20; // This will change the value of num to 20
    

4️⃣ Function Types

Functions in C++ have types determined by their return type and parameter list. Function pointers can store the address of a function and be used to call the function indirectly.

  • Function Declaration:
    • int add(int a, int b) {
          return a + b;
      }
      
  • Function Pointer:
    • int (*funcPtr)(int, int) = &add;
      int result = funcPtr(2, 3); // Calls the add function through the function pointer
      

🥉 User-Defined Data Types

These types are defined by the user to create more complex data structures:

1️⃣ Classes

Classes are blueprints for creating the objects.

  • By default, the members of a class are private, whereas in a struct, they are public.
#include <iostream>
using namespace std;

class Car {
private:
    string brand;
    int year;

public:
    void setBrand(string b) {
        brand = b;
    }

    void setYear(int y) {
        year = y;
    }

    void display() {
        cout << "Brand: " << brand << endl;
        cout << "Year: " << year << endl;
    }
};

int main() {
    Car car1;
    car1.setBrand("Toyota");
    car1.setYear(2020);
    car1.display();
    
    return 0;
}

2️⃣ Structures

A structure is a collection of variables of different types under a single name. Structures are used to represent a record.

Similar to classes but with public members by default.

#include <iostream>
using namespace std;

struct Person {
    string name;
    int age;
};

int main() {
    Person p1;
    p1.name = "John";
    p1.age = 30;
    
    cout << "Name: " << p1.name << endl;
    cout << "Age: " << p1.age << endl;
    
    return 0;
}

3️⃣ Unions

A union is a special data type that allows storing different data types in the same memory location. Only one of the members can be used at a time, and the memory occupied by a union is equal to the memory required by its largest data member.

#include <iostream>
using namespace std;

union Data {
    int intValue;
    float floatValue;
    char charValue;
};

int main() {
    Data data;
    data.intValue = 10;
    cout << "Integer: " << data.intValue << endl;
    
    data.floatValue = 3.14;
    cout << "Float: " << data.floatValue << endl;
    
    data.charValue = 'A';
    cout << "Character: " << data.charValue << endl;
    
    return 0;
}

4️⃣ Enumerations

An enumeration is a user-defined data type that consists of integral constants. Each integral constant is given a name, which makes the code more readable and maintainable.

#include <iostream>
using namespace std;

enum Color {
    RED,
    GREEN,
    BLUE
};

int main() {
    Color c = GREEN;
    
    if (c == GREEN) {
        cout << "The color is green." << endl;
    }
    
    return 0;
}

5️⃣ Typedef

Used to give a new name to an existing data type.

typedef unsigned int uint;