Memory Alignment

Updated on 27 Jul, 202535 mins read 378 views

The Basics of Memory Access

The CPU interacts with memory to read data from or write data to it. While it can theoretically access a single byte at a time, this is not how modern CPUs are optimized to operate. Instead, CPUs typically access memory in chunks of 2, 4, 8, 16, or even 32 bytes. These larger chunks are known as "words" and align with the data bus width of the CPU.

What is Memory Alignment?

Memory alignment refers to the arrangement of data in memory to adhere to specific boundaries. The alignment of data can affect the performance of a system due to how the CPU accesses memory. Properly aligned data allows the CPU to read and write data more efficiently, reducing the number of memory accesses required and thus speeding up operations.

Memory alignment refers to the way data is arranged and accessed in computer memory. It ensures that data is stored at memory addresses that are multiples of their size, which allows the CPU to read and write data more efficiently. For instance, a 4-byte integer is ideally stored at an address that is a multiple of 4 (e.g., 0x0004, 0x0008, etc.).

Alignment Boundaries:

Data is typically aligned on boundaries that are multiples of its size. For instance, Here are some common alignment rules for basic types in a typical 32-bit system:

char: 1-byte alignment (no specific alignment requirement).
- It can be placed at any address.
short: 2-byte alignment (must be stored at addresses divisible by 2).
- It should be placed at even addresses.
int: 4-byte alignment (must be stored at addresses divisible by 4).
- It should be placed at addresses that are multiples of 4.
float: 4-byte alignment (must be stored at addresses divisible by 4).
- It should be placed at addresses that are multiples of 4.
double: 8-byte alignment (must be stored at addresses divisible by 8).
- It should be placed at addresses that are multiples of 8.
long: 4 bytes alignment (on a 32-bit system)
- Should be placed at address divisible by 8.
long long: 8 bytes alignment.
- Should be placed at addresses divisible by 8.
long double: 8 byte or 16-byte alignment (depends on the system).
- Should be placed at addresses divisible by 8 or 16.
pointer: 4 bytes alignment (on a 32-bit system).
- Should be placed at addresses divisible by 4.

Below is the table of Aligned and Unaligned memory access based on address and access size.

Address	Byte (8 bits)	2 Bytes (16 bits)	4 Bytes (32 bits)	8 Bytes (64 bits)
0x0	aligned	aligned	aligned	aligned
0x1	aligned	unaligned	unaligned	unaligned
0x2	aligned	aligned	unaligned	unaligned
0x3	aligned	unaligned	unaligned	unaligned
0x4	aligned	aligned	aligned	unaligned
0x5	aligned	unaligned	unaligned	unaligned
0x6	aligned	aligned	unaligned	unaligned
0x7	aligned	unaligned	unaligned	unaligned
0x8	aligned	aligned	aligned	aligned
0x9	aligned	unaligned	unaligned	unaligned
0xA	aligned	aligned	unaligned	unaligned
0xB	aligned	unaligned	unaligned	unaligned
0xC	aligned	aligned	aligned	unaligned
0xD	aligned	unaligned	unaligned	unaligned
0xE	aligned	aligned	unaligned	unaligned
0xF	aligned	unaligned	unaligned	unaligned

As a practical note, If the rightmost digit of the address (represented in a hexadecimal format) is divisible by the number of bytes, we have aligned memory access.

Why Alignment Matters

Performance: Aligned memory accesses are faster because the CPU can read or write an entire word in a single operation. Misaligned accesses may require multiple operations, additional processing, and memory fetches, leading to performance degradation.
Correctness: Some CPUs enforce alignment requirements and generate faults or exceptions on misaligned accesses. Ensuring proper alignment prevents such issues and enhances software stability.
Hardware Optimization: Modern CPUs and memory subsystems are optimized for aligned accesses. Proper alignment allows the use of hardware features like cache lines and prefetching, further boosting performance.

Memory Alignment in Different Architectures

The alignment requirements and the way CPUs handle misaligned accesses vary across different architectures:

x86 Architecture: Generally tolerant of misaligned accesses but at the cost of performance penalties due to additional processing.
ARM Architecture: Earlier versions strictly required aligned accesses, while modern ARM CPUs can handle misaligned accesses but with a performance hit.
PowerPC Architecture: Strict alignment requirements, with misaligned accesses causing exceptions or faults.

The Importance of Memory Alignment

To fully understand the significance of memory alignment, consider a scenario where data is misaligned. Suppose you have a 4-byte int stored at an address that is not divisible by 4. The CPU would need to perform two memory accesses to read or write this data, first accessing part of the data from one memory address and then accessing the remaining part from the next address. This not only doubles the number of memory accesses but also introduces additional computational overhead to merge or split the data.

In contrast, when data is aligned properly, the CPU can read or write the entire unit in a single, efficient memory access. This is why programming languages and compilers often include mechanisms to ensure proper alignment, and why developers need to be mindful of alignment when optimizing performance-critical code.

CPU Aligned and Misaligned Memory Read:

The CPU tries to read data at its word size for the efficiency. Word Size of a CPU typically refers to the number of bits it can process at once in a single instruction. For example, 32-bit system word size is 32-bit (4 bytes) and 64-bit system's word size is 64-bit (8 bytes).

For example: let's have a struct in memory that looks like this:

struct Example {
    char a;  // one byte
    int b;   // four bytes
    short c; // two bytes
}

On a 32-bit processor it would most likely be aligned like shown here:

The processor can read each of these members in one cycle.

Suppose you trying to access the char a, the CPU just read it in a single cycle since 0x0000 is 4-byte aligned. If you trying to access the int b it is easier for the CPU by reading at memory address 0x0004 which is 4 byte aligned and at last it is easier to access the short c, as it is also the 4 byte aligned.

If you use the packed attribute in your structure, then the compiler will not add padding to align it to 4-byte.

In the provided image, the structure is laid out in memory as follows:

char a is at address 0x0000.
int b starts at address 0x0001 (this is misaligned since int typically needs to be on a 4-byte boundary).
short c starts at address 0x0005 (this is also misaligned since short typically needs to be on a 2-byte boundary).

| Address|
| 0x0000 | 0x0001 | 0x0002 | 0x0003 | 0x0004 | 0x0005 | 0x0006 | 0x0007 |
|--------|--------|--------|--------|--------|--------|--------|--------|

| Data   |
| a      | b1     | b2     | b3     | b4     | c1     | c2     | ...... |

Here, b1, b2, b3, and b4 are the bytes of the int b, with b1 being the LSB and b4 the MSB.

-: Reading `char a` :-

Address: 0x0000
Size: 1 byte

Since char a is only 1 byte, it can be read directly from memory without any issues, regardless of alignment. The CPU fetches the byte at 0x0000.

| Address | 0x0000 | 0x0001 | 0x0002 | 0x0003 |
|---------|--------|--------|--------|--------|
| Data    | a      | b1     | b2     | b3     |

Single Fetch:
- The CPU performs a read operation at the address 0x0000 to fetch the byte a.
- The CPU reads the 4-byte word starting at 0x0000, which fetches data `[]

-: Reading `int b` :-

First Fetch:
- The CPU reads the 4-byte word starting at 0x0000, getting the data: [a, b1, b2, b3].
Second Fetch:
- The CPU reads the 4-byte word starting at 0x0004, getting the data: [b4, c1, c2, ......].
Combining Data:
- The CPU then extracts the relevant bytes from these reads to form the 4-byte int b.
- From the first fetch: it takes b1, b2, b3.
- From the second fetch: it takes b4.
Shifting and Combining:
- The CPU aligns these bytes correctly to form the integer.
- int b = (b4 << 24) | (b3 << 16) | (b2 << 8) | b1;

| Address|
| 0x0000 | 0x0001 | 0x0002 | 0x0003 | 0x0004 | 0x0005 | 0x0006 | 0x0007 |
|--------|--------|--------|--------|--------|--------|--------|--------|

| Data   |
| a      | b1     | b2     | b3     | b4     | c1     | c2     | ...... |
|        | <----- First Fetch-----> | <----- Second Fetch------->       |

Extra Operations Required:

2 read operation (one starting at 0x0000 and another at 0x0004).
Additional steps to extract and combine bytes (this is typically handled internally by the CPU but can be considered as extra processing overhead).

-: Reading `short c` :-

c1 and c2 are the bytes of the short c, with c1 being the LSB and c2 the MSB in a little-endian system.

First Fetch:
- The CPU reads the 4-byte word starting at 0x0004, getting the data: [b4, c1, c2, ......].
Extracting Data:
- From the fetched data, the CPU needs the bytes starting from 0x0005 to form the 2-byte short c.
- It extracts c1 and c2.
Combining Bytes:
- In a little-endian system, the combination is done as follows:
- short c = (c2 << 8) | c1;

| Address | 0x0004 | 0x0005 | 0x0006 | 0x0007 |
|---------|--------|--------|--------|--------|
| Data    | b4     | c1     | c2     | ...... |
|         | <- 4-byte Fetch ->       |

Visualizing Memory Alignment

Let's visualize memory alignment with an example. Consider a structure in C:

struct Example {
    char a;   // 1 byte
    int b;    // 4 bytes
    short c;  // 2 bytes
};

Without alignment, the structure would occupy the following bytes:

Byte Offset	Data
0	a
1	padding
2	padding
3	padding
4	b (start)
5	b
6	b
7	b (end)
8	c (start)
9	c

Here, a occupies the first byte, but b must start at the 4-byte boundary, so bytes 1, 2, and 3 are padding. Similarly, c starts at byte 8 to maintain the 2-byte alignment.

Visualizing this:

| Address | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---------|---|---|---|---|---|---|---|---|---|---|
| Data    | a | P | P | P | b | b | b | b | c | c |

Here, P denotes padding bytes. The size of the structure is 10 bytes.

Implications of Misalignment

If data is misaligned, the CPU may need to perform more operations to access the data. For example, accessing a misaligned int might require two memory accesses instead of one, significantly impacting performance.

Consider accessing an int that is not aligned:

| Address | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---------|---|---|---|---|---|---|---|---|
| Data    | a | b | b | b | b | c | c |   |

Accessing the int that starts at address 1 would require the CPU to read parts of the int from two different memory locations, increasing the number of cycles needed.

Your email address will not be published. Required fields are marked *

Memory Alignment

The Basics of Memory Access

What is Memory Alignment?

Alignment Boundaries:

Why Alignment Matters

Memory Alignment in Different Architectures

The Importance of Memory Alignment

CPU Aligned and Misaligned Memory Read:

-: Reading `char a` :-

-: Reading `int b` :-

-: Reading `short c` :-

Visualizing Memory Alignment

Implications of Misalignment

Leave a comment

Popular Posts

Variadic Function Working in C

How Characters are Stored in Memory

Tags

Quick links

Newsletter

Memory Alignment

The Basics of Memory Access

What is Memory Alignment?

Alignment Boundaries:

Why Alignment Matters

Memory Alignment in Different Architectures

The Importance of Memory Alignment

CPU Aligned and Misaligned Memory Read:

-: Reading char a :-

-: Reading int b :-

-: Reading short c :-

Visualizing Memory Alignment

Implications of Misalignment

Leave a comment

Variadic Function Working in C

How Characters are Stored in Memory

-: Reading `char a` :-

-: Reading `int b` :-

-: Reading `short c` :-