In operating system (OS) development, one crucial aspect is understanding the different file format for the bootloader and the kernel. Well, there are varieties of format available to use. Every OS environment has their own. These format stores the different sections of file differently, although their basic principle is same. One file format which is very straightforward and easiest among all is the Flat Binary Format
.
❔ What is Flat Binary Format?
A binary file format is a way of encoding data that is not human-readable but can be easily processed by a computer. Unlike text files, which are composed of readable characters, binary files contain data in a format that a machine can interpret directly. This data can represent anything from simple numeric values to complex structures like executable programs and images.
Unlike more complex formats like ELF, PE, or Mach-O
, a flat binary file has no metadata, headers, or sections; it consists solely of raw machine code and data. This format is particularly common in early-stage OS development, bootloader, and embedded systems where simplicity and minimal overhead are paramount.
❕ Characteristics of Flat Binary Format
- No Headers: Flat binaries do not contain any headers or metadata. The file starts directly with executable code or data.
- Simplicity: The format is extremely simple and easy to generate and parse. It’s just a stream of bytes that the processor can execute directly.
- Fixed Layout: Since there are no headers or sections, the layout of the code and data within the file is fixed and must be known beforehand.
❕ Usage in OS Development
Flat binary formats are primarily used in:
- Bootloaders: The initial code that the processor executes to boot the OS is often in flat binary format. This code is typically loaded directly from a specific location on a storage device (like the Master Boot Record on a hard drive) and executed by the BIOS or firmware.
- Embedded Systems: Many embedded systems use flat binary formats due to their simplicity and minimal overhead. These systems often have limited resources and require efficient, low-level access to hardware.
- Early Kernel Development: During the initial stages of kernel development, a flat binary format can be used to simplify loading and execution. As the kernel becomes more complex, it may transition to more sophisticated formats like ELF.
Advantages
- Efficiency: Without headers or sections, flat binaries have no overhead, leading to smaller file sizes and faster loading times.
- Simplicity: The lack of complex structures makes flat binaries easy to create and use, especially in environments with limited resources or where simplicity is essential.
- Direct Execution: The processor can execute the code directly without any additional processing or parsing, which is crucial for bootloaders and low-level system code.
- Assemble the source code. We would get a binary output file.
- We can link multiples binary files, one after another to form a single binary.
- The resulting binary file can be loaded into memory at a specific address and executed by the processor. There is no need of parsing it.
Disadvantages
- Lack of Metadata: The absence of headers and metadata means there is no built-in information about the file’s structure, entry points, or segment locations. This information must be hardcoded or provided externally.
- Limited Flexibility: Flat binaries are not suitable for complex applications that require dynamic linking, relocation, or extensive debugging information.
- Manual Management: Developers must manually manage the layout and organization of the code and data within the file, increasing the risk of errors.
Creating Flat Binary Files
Creating a flat binary file typically involves compiling source code into machine code. In many development environments, this can be done using assembler and linker tools with specific options to generate raw binary output.
1 Using an Assembler (eg., NASM):
Assemblers convert assembly language code into machine code. NASM (Netwide Assembler) is a popular choice for generating flat binaries.
Example:
; boot.asm
BITS 16
ORG 0x7C00
start:
mov si, hello
call print_string
cli
hlt
print_string:
mov ah, 0x0E
.print_char:
lodsb
cmp al, 0
je .done
int 0x10
jmp .print_char
.done:
ret
hello db 'Hello, World!', 0
TIMES 510-($-$$) db 0
DW 0xAA55
Compile with NASM:
nasm -f bin boot.bin boot.asm
This command generated a flat binary file boot.bin
.
2 Using a Linker Script:
We can directly produce a flat binary format from the linker (ld
) without generating an ELF file. This approach involves using the linker script. and specific option of the GNU linker (ld
) to generate the binary output directly.
1 Write the Source Code:
Write a program in C or Assembly. For this example, we will use a simple Assembly program.
; boot.asm
BITS 16
ORG 0x7C00
start:
mov si, hello
call print_string
cli
hlt
print_string:
mov ah, 0x0E
.print_char:
lodsb
cmp al, 0
je .done
int 0x10
jmp .print_char
.done:
ret
hello db 'Hello, World!', 0
TIMES 510-($-$$) db 0 ; Fill the rest of the sector with zeros
DW 0xAA55 ; Boot signature
2 Create a Linker Script:
Create a linker script (link.ld
) that defines the memory layout and specifies the output format.
OUTPUT_FORMAT(binary)
ENTRY(start)
SECTIONS {
. = 0x7C00;
.text : {
*(.text)
*(.data)
*(.bss)
}
}
This linker script tells ld
to output in binary format and sets the entry point to start
. The .
symbol sets the location counter to 0x7C00
, which is the typical starting address for a bootloader.
The linker script (link.ld
) is crucial as it directs ld
to output a flat binary file. The OUTPUT_FORMAT(binary)
directive is what ensures the output is a raw binary.
3 Assemble the Source Code:
Use an assembler like NASM to assemble the source code into an object file.
nasm -f elf32 -o boot.o boot.asm
This command will generate the elf32
file format of the boot
file.
4 Link the Object File:
Use ld
with the linker script to produce a flat binary file.
ld -T link.ld -o boot.bin boot.o
This command links the object file using provided linker script and produces the flat binary boot.bin
.
3 Using a Compiler (e.g., GCC) with Linker Scripts
Compilers like GCC can produce flat binaries using custom linker scripts to control the output format.
Example:
1 Source Code (main.c):
void main() {
volatile char *video = (volatile char*)0xb8000;
*video = 'H';
}
2 Linker Script (link.ld):
ENTRY(main)
SECTIONS {
. = 0x1000;
.text : { *(.text) }
.data : { *(.data) }
.bss : { *(.bss) }
}
3 Compilation and Linking:
gcc -ffreestanding -c main.c -o main.o
ld -T link.ld -o main.elf main.o
objcopy -O binary main.elf main.bin
Here, objcopy
converts the ELF file to a flat binary.
Instructions Specific to Flat Binary Format
When creating a flat binary file using NASM, certain instructions and directives are specific to this format. Here are key instructions and directives that are particularly relevant:
1 ORG
Directive:
The org
(origin) keyword in an assembly file specifies the starting address of the code or data in memory. This is particularly important in flat binary formats, where there is no built-in metadata to indicate where the code should be loaded and executed. By using the ORG
directive, you can control the memory layout of your code and ensure it is loaded at the correct address.
Role of ORG
in Flat Binary Assembly Files:
- Setting the Load Address: The
ORG
directive tells the assembler the address at which the code or data will be loaded into memory. This is crucial for flat binaries because there are no headers to specify the load address. - Correct Address Calculation: With
ORG
, the assembler can correctly calculate the addresses of labels and instructions relative to the specified origin. This ensures that any absolute addresses used in the code (like jump or call instructions) point to the correct memory locations. - Aligning Code and Data: In some cases, you might need to align your code or data to specific memory boundaries. The
ORG
directive helps in achieving this alignment by setting the starting point appropriately.
Can we use section .text, .data and .bss
instructions for the Flat Binary Format:
While flat binaries don't inherently support sections like .text
, .data
, or .bss
as more complex formats do, using these sections in your assembly code can still be beneficial for structuring your program logically.
; bootloader.asm
BITS 16 ; Set the CPU mode to 16-bit
ORG 0x7C00 ; Set the origin to 0x7C00
section .text ; Define the text section
start:
mov si, hello
call print_string
cli ; Clear interrupts
hlt ; Halt the CPU
print_string:
mov ah, 0x0E
.print_char:
lodsb
cmp al, 0
je .done
int 0x10
jmp .print_char
.done:
ret
section .data ; Define the data section
hello db 'Hello, World!', 0
section .bss ; Define the bss section (uninitialized data)
TIMES 510-($-$$) db 0 ; Fill the rest of the sector with zeros
DW 0xAA55 ; Boot signature
Explanation of the Sections:
.text
Section: This section contains the executable code. By defining it explicitly, we ensure that the code is organized logically..data
Section: This section contains initialized data. In this example, it contains the string "Hello, World!"..bss
Section: This section is for uninitialized data. Although in this simple example, it's used to fill the rest of the sector with zeros, in more complex programs, it would be used for variables that are zero-initialized at runtime.
Advantages of Using Sections:
- Organization: Using sections helps keep your code organized and modular. It’s easier to manage and understand large codebases.
- Transition to Complex Formats: If you plan to transition from a flat binary format to a more complex format like ELF in the future, starting with sections in your assembly code makes this transition smoother.
📝 Note
Conflict between section .data and times instruction:
The TIMES
directive in assembly is used to fill space with a specific value or to reserve space. When using sections like .text
, .data
, and .bss
in your assembly code, the TIMES
directive's behavior can indeed be affected, especially regarding the overall layout of the flat binary file.
In the context of a flat binary file, the position of the TIMES
directive matters because it depends on the current address (denoted by $
) and the start address (denoted by $$
). If the .data
section is used and placed after the .text
section, the TIMES
directive might not work as intended if it expects to fill space in the .text
section but the address calculations are disrupted by the separate sections.
section .text
start:
mov si, hello
call print_string
cli
hlt
print_string:
mov ah, 0x0E
.print_char:
lodsb
cmp al, 0
je .done
int 0x10
jmp .print_char
.done:
ret
section .data
hello db 'Hello, World!', 0
section .bss
TIMES 512 - ($ - 0x7C00) db 0 ; Fill the rest of the 512 bytes from the start address