The Basics of Memory Access
The CPU interacts with memory to read data from or write data to it. While it can theoretically access a single byte at a time, this is not how modern CPUs are optimized to operate. Instead, CPUs typically access memory in chunks of 2, 4, 8, 16, or even 32 bytes. These larger chunks are known as "words" and align with the data bus width of the CPU.
What is Memory Alignment?
Memory alignment refers to the arrangement of data in memory to adhere to specific boundaries. The alignment of data can affect the performance of a system due to how the CPU accesses memory. Properly aligned data allows the CPU to read and write data more efficiently, reducing the number of memory accesses required and thus speeding up operations.
Memory alignment refers to the way data is arranged and accessed in computer memory. It ensures that data is stored at memory addresses that are multiples of their size, which allows the CPU to read and write data more efficiently. For instance, a 4-byte integer is ideally stored at an address that is a multiple of 4 (e.g., 0x0004, 0x0008, etc.).
Alignment Boundaries:
Data is typically aligned on boundaries that are multiples of its size. For instance, Here are some common alignment rules for basic types in a typical 32-bit system:
- char: 1-byte alignment (no specific alignment requirement).
- It can be placed at any address.
- short: 2-byte alignment (must be stored at addresses divisible by 2).
- It should be placed at even addresses.
- int: 4-byte alignment (must be stored at addresses divisible by 4).
- It should be placed at addresses that are multiples of 4.
- float: 4-byte alignment (must be stored at addresses divisible by 4).
- It should be placed at addresses that are multiples of 4.
- double: 8-byte alignment (must be stored at addresses divisible by 8).
- It should be placed at addresses that are multiples of 8.
- long: 4 bytes alignment (on a 32-bit system)
- Should be placed at address divisible by 8.
- long long: 8 bytes alignment.
- Should be placed at addresses divisible by 8.
- long double: 8 byte or 16-byte alignment (depends on the system).
- Should be placed at addresses divisible by 8 or 16.
- pointer: 4 bytes alignment (on a 32-bit system).
- Should be placed at addresses divisible by 4.
Below is the table of Aligned and Unaligned memory access based on address and access size.
Address | Byte (8 bits) | 2 Bytes (16 bits) | 4 Bytes (32 bits) | 8 Bytes (64 bits) |
---|---|---|---|---|
0x0 | aligned | aligned | aligned | aligned |
0x1 | aligned | unaligned | unaligned | unaligned |
0x2 | aligned | aligned | unaligned | unaligned |
0x3 | aligned | unaligned | unaligned | unaligned |
0x4 | aligned | aligned | aligned | unaligned |
0x5 | aligned | unaligned | unaligned | unaligned |
0x6 | aligned | aligned | unaligned | unaligned |
0x7 | aligned | unaligned | unaligned | unaligned |
0x8 | aligned | aligned | aligned | aligned |
0x9 | aligned | unaligned | unaligned | unaligned |
0xA | aligned | aligned | unaligned | unaligned |
0xB | aligned | unaligned | unaligned | unaligned |
0xC | aligned | aligned | aligned | unaligned |
0xD | aligned | unaligned | unaligned | unaligned |
0xE | aligned | aligned | unaligned | unaligned |
0xF | aligned | unaligned | unaligned | unaligned |
As a practical note, If the rightmost digit of the address (represented in a hexadecimal format) is divisible by the number of bytes, we have aligned memory access.
Why Alignment Matters
- Performance: Aligned memory accesses are faster because the CPU can read or write an entire word in a single operation. Misaligned accesses may require multiple operations, additional processing, and memory fetches, leading to performance degradation.
- Correctness: Some CPUs enforce alignment requirements and generate faults or exceptions on misaligned accesses. Ensuring proper alignment prevents such issues and enhances software stability.
- Hardware Optimization: Modern CPUs and memory subsystems are optimized for aligned accesses. Proper alignment allows the use of hardware features like cache lines and prefetching, further boosting performance.
Memory Alignment in Different Architectures
The alignment requirements and the way CPUs handle misaligned accesses vary across different architectures:
- x86 Architecture: Generally tolerant of misaligned accesses but at the cost of performance penalties due to additional processing.
- ARM Architecture: Earlier versions strictly required aligned accesses, while modern ARM CPUs can handle misaligned accesses but with a performance hit.
- PowerPC Architecture: Strict alignment requirements, with misaligned accesses causing exceptions or faults.
The Importance of Memory Alignment
To fully understand the significance of memory alignment, consider a scenario where data is misaligned. Suppose you have a 4-byte int
stored at an address that is not divisible by 4. The CPU would need to perform two memory accesses to read or write this data, first accessing part of the data from one memory address and then accessing the remaining part from the next address. This not only doubles the number of memory accesses but also introduces additional computational overhead to merge or split the data.
In contrast, when data is aligned properly, the CPU can read or write the entire unit in a single, efficient memory access. This is why programming languages and compilers often include mechanisms to ensure proper alignment, and why developers need to be mindful of alignment when optimizing performance-critical code.
CPU Aligned and Misaligned Memory Read:
The CPU tries to read data at its word size for the efficiency. Word Size of a CPU typically refers to the number of bits it can process at once in a single instruction. For example, 32-bit system word size is 32-bit (4 bytes) and 64-bit system's word size is 64-bit (8 bytes).
For example: let's have a struct in memory that looks like this:
struct Example {
char a; // one byte
int b; // four bytes
short c; // two bytes
}
On a 32-bit processor it would most likely be aligned like shown here:

The processor can read each of these members in one cycle.
Suppose you trying to access the char a
, the CPU just read it in a single cycle since 0x0000
is 4-byte aligned. If you trying to access the int b
it is easier for the CPU by reading at memory address 0x0004
which is 4 byte aligned and at last it is easier to access the short c
, as it is also the 4 byte aligned.
If you use the packed
attribute in your structure, then the compiler will not add padding to align it to 4-byte.

In the provided image, the structure is laid out in memory as follows:
char a
is at address0x0000
.int b
starts at address0x0001
(this is misaligned sinceint
typically needs to be on a 4-byte boundary).short c
starts at address0x0005
(this is also misaligned sinceshort
typically needs to be on a 2-byte boundary).
| Address|
| 0x0000 | 0x0001 | 0x0002 | 0x0003 | 0x0004 | 0x0005 | 0x0006 | 0x0007 |
|--------|--------|--------|--------|--------|--------|--------|--------|
| Data |
| a | b1 | b2 | b3 | b4 | c1 | c2 | ...... |
Here, b1, b2, b3, and b4 are the bytes of the int b, with b1 being the LSB and b4 the MSB.
-: Reading char a
:-
- Address:
0x0000
- Size: 1 byte
Since char a
is only 1 byte, it can be read directly from memory without any issues, regardless of alignment. The CPU fetches the byte at 0x0000.
| Address | 0x0000 | 0x0001 | 0x0002 | 0x0003 |
|---------|--------|--------|--------|--------|
| Data | a | b1 | b2 | b3 |
- Single Fetch:
- The CPU performs a read operation at the address
0x0000
to fetch the bytea
. - The CPU reads the 4-byte word starting at
0x0000
, which fetches data `[]
- The CPU performs a read operation at the address
-: Reading int b
:-
- First Fetch:
- The CPU reads the 4-byte word starting at
0x0000
, getting the data:[a, b1, b2, b3]
.
- The CPU reads the 4-byte word starting at
- Second Fetch:
- The CPU reads the 4-byte word starting at
0x0004
, getting the data:[b4, c1, c2, ......]
.
- The CPU reads the 4-byte word starting at
- Combining Data:
- The CPU then extracts the relevant bytes from these reads to form the 4-byte
int b
. - From the first fetch: it takes
b1, b2, b3
. - From the second fetch: it takes
b4
.
- The CPU then extracts the relevant bytes from these reads to form the 4-byte
- Shifting and Combining:
- The CPU aligns these bytes correctly to form the integer.
int b = (b4 << 24) | (b3 << 16) | (b2 << 8) | b1;
| Address|
| 0x0000 | 0x0001 | 0x0002 | 0x0003 | 0x0004 | 0x0005 | 0x0006 | 0x0007 |
|--------|--------|--------|--------|--------|--------|--------|--------|
| Data |
| a | b1 | b2 | b3 | b4 | c1 | c2 | ...... |
| | <----- First Fetch-----> | <----- Second Fetch-------> |
Extra Operations Required:
- 2 read operation (one starting at
0x0000
and another at0x0004
). - Additional steps to extract and combine bytes (this is typically handled internally by the CPU but can be considered as extra processing overhead).
-: Reading short c
:-
c1
and c2
are the bytes of the short c
, with c1
being the LSB and c2
the MSB in a little-endian system.
- First Fetch:
- The CPU reads the 4-byte word starting at
0x0004
, getting the data:[b4, c1, c2, ......]
.
- The CPU reads the 4-byte word starting at
- Extracting Data:
- From the fetched data, the CPU needs the bytes starting from
0x0005
to form the 2-byteshort c
. - It extracts
c1
andc2
.
- From the fetched data, the CPU needs the bytes starting from
- Combining Bytes:
- In a little-endian system, the combination is done as follows:
short c = (c2 << 8) | c1;
| Address | 0x0004 | 0x0005 | 0x0006 | 0x0007 |
|---------|--------|--------|--------|--------|
| Data | b4 | c1 | c2 | ...... |
| | <- 4-byte Fetch -> |
Visualizing Memory Alignment
Let's visualize memory alignment with an example. Consider a structure in C:
struct Example {
char a; // 1 byte
int b; // 4 bytes
short c; // 2 bytes
};
Without alignment, the structure would occupy the following bytes:
Byte Offset | Data |
---|---|
0 | a |
1 | padding |
2 | padding |
3 | padding |
4 | b (start) |
5 | b |
6 | b |
7 | b (end) |
8 | c (start) |
9 | c |
Here, a
occupies the first byte, but b
must start at the 4-byte boundary, so bytes 1, 2, and 3 are padding. Similarly, c
starts at byte 8 to maintain the 2-byte alignment.
Visualizing this:
| Address | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---------|---|---|---|---|---|---|---|---|---|---|
| Data | a | P | P | P | b | b | b | b | c | c |
Here, P
denotes padding bytes. The size of the structure is 10 bytes.
Implications of Misalignment
If data is misaligned, the CPU may need to perform more operations to access the data. For example, accessing a misaligned int
might require two memory accesses instead of one, significantly impacting performance.
Consider accessing an int
that is not aligned:
| Address | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---------|---|---|---|---|---|---|---|---|
| Data | a | b | b | b | b | c | c | |
Accessing the int
that starts at address 1 would require the CPU to read parts of the int
from two different memory locations, increasing the number of cycles needed.