CLOSE

In operating system (OS) development, one crucial aspect is understanding the different file format for the bootloader and the kernel. Well, there are varieties of format available to use. Every OS environment has their own. These format stores the different sections of file differently, although their basic principle is same. One file format which is very straightforward and easiest among all is the Flat Binary Format.

❔ What is Flat Binary Format?

A binary file format is a way of encoding data that is not human-readable but can be easily processed by a computer. Unlike text files, which are composed of readable characters, binary files contain data in a format that a machine can interpret directly. This data can represent anything from simple numeric values to complex structures like executable programs and images.

Unlike more complex formats like ELF, PE, or Mach-O, a flat binary file has no metadata, headers, or sections; it consists solely of raw machine code and data. This format is particularly common in early-stage OS development, bootloader, and embedded systems where simplicity and minimal overhead are paramount.

❕ Characteristics of Flat Binary Format

  • No Headers: Flat binaries do not contain any headers or metadata. The file starts directly with executable code or data.
  • Simplicity: The format is extremely simple and easy to generate and parse. It’s just a stream of bytes that the processor can execute directly.
  • Fixed Layout: Since there are no headers or sections, the layout of the code and data within the file is fixed and must be known beforehand.

❕ Usage in OS Development

Flat binary formats are primarily used in:

  1. Bootloaders: The initial code that the processor executes to boot the OS is often in flat binary format. This code is typically loaded directly from a specific location on a storage device (like the Master Boot Record on a hard drive) and executed by the BIOS or firmware.
  2. Embedded Systems: Many embedded systems use flat binary formats due to their simplicity and minimal overhead. These systems often have limited resources and require efficient, low-level access to hardware.
  3. Early Kernel Development: During the initial stages of kernel development, a flat binary format can be used to simplify loading and execution. As the kernel becomes more complex, it may transition to more sophisticated formats like ELF.

Advantages

  1. Efficiency: Without headers or sections, flat binaries have no overhead, leading to smaller file sizes and faster loading times.
  2. Simplicity: The lack of complex structures makes flat binaries easy to create and use, especially in environments with limited resources or where simplicity is essential.
  3. Direct Execution: The processor can execute the code directly without any additional processing or parsing, which is crucial for bootloaders and low-level system code.
    1. Assemble the source code. We would get a binary output file.
    2. We can link multiples binary files, one after another to form a single binary.
    3. The resulting binary file can be loaded into memory at a specific address and executed by the processor. There is no need of parsing it.

Disadvantages

  1. Lack of Metadata: The absence of headers and metadata means there is no built-in information about the file’s structure, entry points, or segment locations. This information must be hardcoded or provided externally.
  2. Limited Flexibility: Flat binaries are not suitable for complex applications that require dynamic linking, relocation, or extensive debugging information.
  3. Manual Management: Developers must manually manage the layout and organization of the code and data within the file, increasing the risk of errors.

Creating Flat Binary Files

Creating a flat binary file typically involves compiling source code into machine code. In many development environments, this can be done using assembler and linker tools with specific options to generate raw binary output.

1 Using an Assembler (eg., NASM):

Assemblers convert assembly language code into machine code. NASM (Netwide Assembler) is a popular choice for generating flat binaries.

Example:

; boot.asm
BITS 16
ORG 0x7C00

start:
    mov si, hello
    call print_string
    cli
    hlt

print_string:
    mov ah, 0x0E
.print_char:
    lodsb
    cmp al, 0
    je .done
    int 0x10
    jmp .print_char
.done:
    ret

hello db 'Hello, World!', 0

TIMES 510-($-$$) db 0
DW 0xAA55

Compile with NASM:

nasm -f bin boot.bin boot.asm

This command generated a flat binary file boot.bin.

2 Using a Linker Script:

We can directly produce a flat binary format from the linker (ld) without generating an ELF file. This approach involves using the linker script. and specific option of the GNU linker (ld) to generate the binary output directly.

1 Write the Source Code:

Write a program in C or Assembly. For this example, we will use a simple Assembly program.

; boot.asm
BITS 16
ORG 0x7C00

start:
    mov si, hello
    call print_string
    cli
    hlt

print_string:
    mov ah, 0x0E
.print_char:
    lodsb
    cmp al, 0
    je .done
    int 0x10
    jmp .print_char
.done:
    ret

hello db 'Hello, World!', 0

TIMES 510-($-$$) db 0  ; Fill the rest of the sector with zeros
DW 0xAA55             ; Boot signature

2 Create a Linker Script:

Create a linker script (link.ld) that defines the memory layout and specifies the output format.

OUTPUT_FORMAT(binary)
ENTRY(start)
SECTIONS {
    . = 0x7C00;
    .text : {
        *(.text)
        *(.data)
        *(.bss)
    }
}

This linker script tells ld to output in binary format and sets the entry point to start. The . symbol sets the location counter to 0x7C00, which is the typical starting address for a bootloader.

The linker script (link.ld) is crucial as it directs ld to output a flat binary file. The OUTPUT_FORMAT(binary) directive is what ensures the output is a raw binary.

3 Assemble the Source Code:

Use an assembler like NASM to assemble the source code into an object file.

nasm -f elf32 -o boot.o boot.asm

This command will generate the elf32 file format of the boot file.

4 Link the Object File:

Use ld with the linker script to produce a flat binary file.

ld -T link.ld -o boot.bin boot.o

This command links the object file using provided linker script and produces the flat binary boot.bin.

3 Using a Compiler (e.g., GCC) with Linker Scripts

Compilers like GCC can produce flat binaries using custom linker scripts to control the output format.

Example:

1 Source Code (main.c):

void main() {
    volatile char *video = (volatile char*)0xb8000;
    *video = 'H';
}

2 Linker Script (link.ld):

ENTRY(main)
SECTIONS {
    . = 0x1000;
    .text : { *(.text) }
    .data : { *(.data) }
    .bss : { *(.bss) }
}

3 Compilation and Linking:

gcc -ffreestanding -c main.c -o main.o
ld -T link.ld -o main.elf main.o
objcopy -O binary main.elf main.bin

Here, objcopy converts the ELF file to a flat binary.

Instructions Specific to Flat Binary Format

When creating a flat binary file using NASM, certain instructions and directives are specific to this format. Here are key instructions and directives that are particularly relevant:

1 ORG Directive:

The org (origin) keyword in an assembly file specifies the starting address of the code or data in memory. This is particularly important in flat binary formats, where there is no built-in metadata to indicate where the code should be loaded and executed. By using the ORG directive, you can control the memory layout of your code and ensure it is loaded at the correct address.

Role of ORG in Flat Binary Assembly Files:

  1. Setting the Load Address: The ORG directive tells the assembler the address at which the code or data will be loaded into memory. This is crucial for flat binaries because there are no headers to specify the load address.
  2. Correct Address Calculation: With ORG, the assembler can correctly calculate the addresses of labels and instructions relative to the specified origin. This ensures that any absolute addresses used in the code (like jump or call instructions) point to the correct memory locations.
  3. Aligning Code and Data: In some cases, you might need to align your code or data to specific memory boundaries. The ORG directive helps in achieving this alignment by setting the starting point appropriately.

Can we use section .text, .data and .bss instructions for the Flat Binary Format:

While flat binaries don't inherently support sections like .text, .data, or .bss as more complex formats do, using these sections in your assembly code can still be beneficial for structuring your program logically.

; bootloader.asm
BITS 16           ; Set the CPU mode to 16-bit
ORG 0x7C00        ; Set the origin to 0x7C00

section .text     ; Define the text section
start:
    mov si, hello
    call print_string
    cli            ; Clear interrupts
    hlt            ; Halt the CPU

print_string:
    mov ah, 0x0E
.print_char:
    lodsb
    cmp al, 0
    je .done
    int 0x10
    jmp .print_char
.done:
    ret

section .data     ; Define the data section
hello db 'Hello, World!', 0

section .bss      ; Define the bss section (uninitialized data)
TIMES 510-($-$$) db 0  ; Fill the rest of the sector with zeros
DW 0xAA55             ; Boot signature

Explanation of the Sections:

  • .text Section: This section contains the executable code. By defining it explicitly, we ensure that the code is organized logically.
  • .data Section: This section contains initialized data. In this example, it contains the string "Hello, World!".
  • .bss Section: This section is for uninitialized data. Although in this simple example, it's used to fill the rest of the sector with zeros, in more complex programs, it would be used for variables that are zero-initialized at runtime.

Advantages of Using Sections:

  1. Organization: Using sections helps keep your code organized and modular. It’s easier to manage and understand large codebases.
  2. Transition to Complex Formats: If you plan to transition from a flat binary format to a more complex format like ELF in the future, starting with sections in your assembly code makes this transition smoother.

📝 Note

Conflict between section .data and times instruction:

The TIMES directive in assembly is used to fill space with a specific value or to reserve space. When using sections like .text, .data, and .bss in your assembly code, the TIMES directive's behavior can indeed be affected, especially regarding the overall layout of the flat binary file.

In the context of a flat binary file, the position of the TIMES directive matters because it depends on the current address (denoted by $) and the start address (denoted by $$). If the .data section is used and placed after the .text section, the TIMES directive might not work as intended if it expects to fill space in the .text section but the address calculations are disrupted by the separate sections.

section .text
start:
    mov si, hello
    call print_string
    cli
    hlt

print_string:
    mov ah, 0x0E
.print_char:
    lodsb
    cmp al, 0
    je .done
    int 0x10
    jmp .print_char
.done:
    ret

section .data
hello db 'Hello, World!', 0

section .bss
TIMES 512 - ($ - 0x7C00) db 0  ; Fill the rest of the 512 bytes from the start address