Instruction Syntax and Format

Assembly language, the closet representation of machine code understood by humans, serves as the bridge between high-level programming languages and the underlying hardware architecture.

Basically there are two syntax forms for the x86 assembly language: Intel and AT&T syntax.

Intel Syntax:

Intel syntax is the traditional and original syntax employed by intel and many other x86 assembly language assemblers. It is the default syntax for many x86 assemblers, such as NASM (Netwide Assembler) and MASM (Microsoft Macro Assembler).

1️⃣ Operand Order:

In Intel syntax, the destination operand preceded the source operand. For example, in a move instruction (MOV), the destination register or memory location is specified first, followed by the source register or immediate value.

Example:

mov eax, ebx  ; Move contents of ebx into eax

2️⃣ Immediate Values:

In Intel syntax, immediate values are written without any special prefix. Hexadecimal values are typically suffixed with an h, and if a hexadecimal value starts with a letter (A-F), it should be prefixed with a 0 to avoid confusion with labels or variable names. Decimal values are written directly without any suffix.

Example:

mov eax, $123  ; Move immediate value 123 into eax
Hexadecimal Values:
mov eax, 0FFh      ; Move the hexadecimal value 0xFF into EAX
mov ebx, 0x1234h   ; Move the hexadecimal value 0x1234 into EBX
Decimal Values:
mov eax, 255       ; Move the decimal value 255 into EAX
mov ebx, 4660      ; Move the decimal value 4660 into EBX

3️⃣ Register Naming:

In Intel syntax, registers are named directly without any prefixes. The register names are consistent across various operand sizes, and there are no additional symbols or characters before the register names.

Here are the common register names in Intel syntax:

  • General Purpose Registers:
    • 8-bit: AL, BL, CL, DL, AH, BH, CH, DH
    • 16-bit: AX, BX, CX, DX, SI, DI, BP, SP
    • 32-bit: EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP
    • 64-bit: RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP, R8-R15
  • Segment Registers: CS, DS, ES, FS, GS, SS
  • Instruction Pointer: EIP (32-bit), RIP (64-bit)
  • Flags Register: EFLAGS (32-bit), RFLAGS (64-bit)
Move a value into a register:
mov eax, 10h       ; Move the immediate value 10h into EAX
mov al, 0FFh       ; Move the immediate value 0FFh into AL
mov rax, 1234h     ; Move the immediate value 1234h into RAX
Move a value between registers:
mov ebx, eax       ; Move the value in EAX to EBX
mov bl, al         ; Move the value in AL to BL
mov rbx, rax       ; Move the value in RAX to RBX

4️⃣ Memory Addressing:

In Intel syntax, memory operands are enclosed in square brackets ([]). The format for memory addressing can vary, but it generally follows the pattern:

[base + index * scale + displacement]

Where:

  • base: A base register (e.g., eax, ebx, ecx, edx, esi, edi, ebp, esp, r8-r15)
  • index: An index register (e.g., eax, ebx, ecx, edx, esi, edi, ebp, esp, r8-r15)
  • scale: A scaling factor (1, 2, 4, or 8)
  • displacement: A constant value

Examples:

  • Simple Addressing:
mov eax, [ebx] ; Move the value at the memory address stored in EBX into EAX

mov [eax], ebx ; Move the value in EBX to the memory address stored in EAX
  • Complex Addressing:
mov eax, [ebx + 4]       ; Move the value at the memory address (EBX + 4) into EAX

mov eax, [ebx + ecx * 2]   ; Move the value at the memory address (EBX + ECX * 2) into EAX

mov eax, [ebx + ecx * 2 + 8] ; Move the value at the memory address (EBX + ECX * 2 + 8) into EAX

5️⃣ Size of Data being Manipulated:

In Intel syntax, the size of data being manipulated is often indicated using size specifiers. These specifiers provide additional information about the size of the data involved in an instruction, such as bytes, words, double words, or quad words.

  • The operand size can be a byte (8 bits), a word (16 bits), a double word (32 bits), or a quad word (64 bits).
Byte (8 bits):
mov byte [eax], 0xFF  ; Move the byte value 0xFF to the memory location pointed by EAX
Word (16 bits):
mov word [eax], 0xFFFF  ; Move the word value 0xFFFF to the memory location pointed by EAX
Double Word (32 bits):
mov dword [eax], 0xFFFFFFFF  ; Move the double word value 0xFFFFFFFF to the memory location pointed by EAX
Quad Word (64 bits):
mov qword [rax], 0xFFFFFFFFFFFFFFFF  ; Move the quad word value 0xFFFFFFFFFFFFFFFF to the memory location pointed by RAX

6️⃣ Segment Overrides:

When accessing memory, segment overrides are used to specify a segment register other than the default. Both syntaxes allow segment overrides, but the syntax is different.

mov ax, ds:[bx]     ; Move the value at the memory address in DS:BX into AX

AT&T Syntax:

AT&T syntax, commonly associated with Unix-based systems and the GNU Compiler Collection (GCC), follows a distinct convention. It differs from Intel syntax in several key ways:

1️⃣ Operand Order:

In AT&T syntax, the source operand comes before the destination operand. Thus, the source register or memory location is specified first, followed by the destination register or memory location.

Example:

mov %ebx, %eax  ; Move contents of eax into ebx

2️⃣ Immediate Values:

In AT&T syntax, immediate values are prefixed with a dollar sign ($). Hexadecimal values are written with a 0x prefix, similar to C programming language syntax. Decimal values are also prefixed with a dollar sign but do not have any additional suffix.

Example:

movl $0xFF, %eax   ; Move the immediate value 0xFF to EAX
Hexadecimal Values:
movl $0xFF, %eax   ; Move the hexadecimal value 0xFF into EAX
movl $0x1234, %ebx ; Move the hexadecimal value 0x1234 into EBX
Decimal Values:
movl $255, %eax    ; Move the decimal value 255 into EAX
movl $4660, %ebx   ; Move the decimal value 4660 into EBX

3️⃣ Register Naming:

In AT&T syntax, registers are always prefixed with a percent sign (%). This prefix distinguishes register names from other operands and constants within the code. The actual register names remain the same, but the % prefix must be used consistently.

Here are the common register names in AT&T syntax:

  • General Purpose Registers:
    • 8-bit: %al, %bl, %cl, %dl, %ah, %bh, %ch, %dh
    • 16-bit: %ax, %bx, %cx, %dx, %si, %di, %bp, %sp
    • 32-bit: %eax, %ebx, %ecx, %edx, %esi, %edi, %ebp, %esp
    • 64-bit: %rax, %rbx, %rcx, %rdx, %rsi, %rdi, %rbp, %rsp, %r8-%r15
  • Segment Registers: %cs, %ds, %es, %fs, %gs, %ss
  • Instruction Pointer: %eip (32-bit), %rip (64-bit)
  • Flags Register: %eflags (32-bit), %rflags (64-bit)
Move a value into a register:
movl $0x10, %eax   ; Move the immediate value 0x10 into EAX
movb $0xFF, %al    ; Move the immediate value 0xFF into AL
movq $0x1234, %rax ; Move the immediate value 0x1234 into RAX
Move a value between registers:
movl %eax, %ebx    ; Move the value in EAX to EBX
movb %al, %bl      ; Move the value in AL to BL
movq %rax, %rbx    ; Move the value in RAX to RBX

4️⃣ Memory Addressing:

In AT&T syntax, memory operands are enclosed in parentheses (()), and the format for memory addressing is slightly different. The general pattern is:

displacement(base, index, scale)

Where:

  • displacement: A constant value
  • base: A base register
  • index: An index register
  • scale: A scaling factor (1, 2, 4, or 8)

Examples:

  • Simple Addressing:
movl (%ebx), %eax      ; Move the value at the memory address stored in EBX into EAX

movl %ebx, (%eax)      ; Move the value in EBX to the memory address stored in EAX
  • Complex Addressing:
movl 4(%ebx), %eax     ; Move the value at the memory address (EBX + 4) into EAX

movl (%ebx,%ecx,2), %eax ; Move the value at the memory address (EBX + ECX * 2) into EAX

movl 8(%ebx,%ecx,2), %eax ; Move the value at the memory address (EBX + ECX * 2 + 8) into EAX

5️⃣ Operand Sizes:

In AT&T syntax, the size of the operand is specified using suffixes on the instruction mnemonic. The suffixes used are b for byte, w for word, l for long (double word), q for quad word.

Byte (8 bits):
movb $0xFF, (%eax)  ; Move the byte value 0xFF to the memory location pointed by EAX
Word (16 bits):
movw $0xFFFF, (%eax)  ; Move the word value 0xFFFF to the memory location pointed by EAX
Double Word (32 bits):
movl $0xFFFFFFFF, (%eax)  ; Move the double word value 0xFFFFFFFF to the memory location pointed by EAX
Quad Word (64 bits):
movq $0xFFFFFFFFFFFFFFFF, (%rax)  ; Move the quad word value 0xFFFFFFFFFFFFFFFF to the memory location pointed by RAX

6️⃣ Segment Overrides:

When accessing memory, segment overrides are used to specify a segment register other than the default.

movw %ds:(%bx), %ax ; Move the value at the memory address in DS:BX into AX