OS Kernel Format
The kernel, which is the core component of an operating system, can be packaged and stored in different file formats for various purposes. The two main formats discussed here are flat binary (bin) files and ELF (Executable and Linkable Format) files. Here's a detailed comparison of having a kernel in a binary file versus an ELF file:
1 Flat Binary (Bin) File
Characteristics:
- Format: A binary file is a raw sequence of bytes that the computer's hardware or bootloader can directly load into memory and execute.
- Structure: Generally lacks any headers or metadata that describe the contents. It is a flat binary image, meaning it starts executing from a predefined entry point without any additional information about segments or sections.
Concept of Sections:
In a pure binary file format (commonly referred to as a "flat binary"), there is no explicit concept of sections like .text
, .data
, or .bss
as you would find in more complex executable formats such as ELF or PE. Instead, all code and data are simply placed sequentially in the file, and it is up to the programmer to manage the memory layout.
Advantages:
- Simplicity: The format is straightforward, consisting of raw binary data that the system can directly load and execute.
- Bootloaders: Some bootloaders or firmware may prefer or require raw binary images because they can be simpler to handle and parse.
- Minimal Overhead: Without additional headers or metadata, the file size can be smaller, which might be beneficial in resource-constrained environments.
Disadvantages
- Lack of Metadata: Since binary files lack headers, there is no embedded information about the file structure, making it harder to perform certain tasks like debugging, symbol resolution, or dynamic linking.
- Limited Flexibility: The absence of segments and section information can make it less flexible for certain advanced uses, such as dynamic loading or linking of kernel modules.
2 ELF (Executable and Linkable Format) File
Characteristics
- Format: ELF files include a standard format with headers, program headers, and section headers that describe the file's structure and contents.
- Structure: Contains detailed information about different sections (code, data, BSS) and segments, along with metadata such as symbol tables and relocation information.
Advantages
- Rich Metadata: ELF files contain extensive metadata, including information about sections and segments, which aids in debugging, dynamic linking, and symbol resolution.
- Flexibility: The ELF format supports dynamic linking and loading of shared libraries or kernel modules, providing greater flexibility and modularity.
- Tooling Support: Many development and debugging tools are designed to work with ELF files, making it easier to develop and maintain the kernel.
- Debugging: The inclusion of symbol tables and other debugging information makes it easier to diagnose and fix issues within the kernel.
Disadvantages
- Complexity: The format is more complex, which might require more sophisticated bootloaders that can parse ELF headers and load the kernel properly.
- Overhead: The additional headers and metadata can increase the file size slightly, although this is often negligible compared to the benefits provided.
ELF File
An ELF (Executable and Linkable Format) file is a common standard file format used for executables, object code, shared libraries, and core dumps. It is widely used in Unix and Unix-like operating systems, such as Linux.
Key Components of an ELF File
1 Header:
- Provides metadata about the file.
- Includes the file type (executable, shared object, etc.), target architecture, entry point address, and offsets to other parts of the file.
2 Program Header Table:
- Describes segments used at runtime.
- Contains information needed for memory mapping, such as segment sizes and access permissions.
3 Section Header Table:
- Describes the sections that make up the file.
- Each section contains different types of data, such as code, data, symbol tables, and debugging information.
4 Sections:
- Various parts of the file, including:
.text
: Contains executable code..data
: Contains initialized data..bss
: Contains uninitialized data..rodata
: Contains read-only data..symtab
and.strtab
: Symbol table and string table for symbol names..debug
: Contains debugging information.
ELF File Types
1 Executable Files:
- These files contain code and data for programs that can be executed by the operating system.
2 Shared Libraries:
- These files contain code and data that can be used by multiple programs simultaneously, reducing memory and disk space usage.
3 Object Files:
- Intermediate files produced during the compilation process.
- They are later linked to form executables or shared libraries.
4 Core Dumps:
- These files are snapshots of a program's memory and state at a specific point, typically when a program crashes.
- They are used for debugging purposes.
Difference between Flat Binary and ELF Format:
Feature | Flat Binary | ELF (Executable and Linkable Format) |
---|---|---|
Purpose | Simple, direct execution of code | Advanced, structured format for executables, libraries, and object files |
Structure | Sequential, raw byte layout | Structured with headers, sections, and segments |
Sections | No explicit sections; all code and data are sequentially placed. | Supports multiple sections like .text , .data , .bss |
Metadata | Minimal to none (raw bytes) | Contains detailed metadata in headers |
Executable Header | No header, just raw instructions/data | ELF header (metadata about the file) |
Portability | Specific to a certain memory layout and architecture | More portable across different systems and architectures |
Linking | Typically does not support linking of multiple files | Supports static and dynamic linking |
Loading Mechanism | Simple: load to a specific memory address | Complex: involves parsing headers and sections |
Debugging Information | None | Can contain debugging information (e.g., symbol tables, debug sections) |
Relocation | Not supported | Supports relocation entries for shared libraries |
Symbol Resolution | No support | Supports symbol tables for resolving function and variable names |
Size Overhead | Minimal (just the code and data) | Some overhead due to headers and additional sections |
Error Handling | Simple error handling (e.g., BIOS or bootloader specific) | More robust error handling via ELF loader |
Usage Scenarios | Embedded systems, bootloaders, minimal systems | General-purpose operating systems, complex applications |
Tools Support | Limited to basic assembly and concatenation tools | Wide range of tools (linkers, debuggers, profilers) |
File Extensions | .bin , .img | .o , .so , .a , .elf , etc. |
Executable Identification | Not identifiable by content alone | Identifiable by ELF magic number (0x7F 'E' 'L' 'F' ) |
Loading the Binary Kernel
As we already understood that the binary file is plain binary format which is execution ready. Just load the binary kernel into memory and jump to its loaded location for executing its code.
Example 1: Single File Kernel:
Let's create a simple binary kernel and bootloader in assembly language, both use a plain binary format, no header things. The bootloader will load the kernel into memory and jump to its location to execute its code.
Step 1: Create the Bootloader
The bootloader will reside in the first sector of the disk (512 bytes) and will load the kernel from the second sector onward.
bootloader.asm
:
[BITS 16] ; We are writing 16-bit real mode code
[ORG 0x7C00] ; BIOS loads the bootloader at address 0x7C00
start:
; Set up segment registers
xor ax, ax
mov ds, ax
mov es, ax
; Load the second sector (first sector of the kernel) to 0x1000:0x0000
mov ah, 0x02 ; BIOS read sectors function
mov al, 1 ; Number of sectors to read
mov ch, 0 ; Cylinder number
mov cl, 2 ; Sector number (start from second sector)
mov dh, 0 ; Head number
mov dl, 0x80 ; Drive number (first hard disk)
mov bx, 0x1000 ; Load address
int 0x13 ; Call BIOS interrupt
jc load_error ; If carry flag is set, jump to load_error
; Jump to kernel entry point
jmp 0x1000:0x0000
load_error:
; Print error message and halt
mov si, error_msg
print_loop:
lodsb
cmp al, 0
je halt
mov ah, 0x0E
int 0x10
jmp print_loop
halt:
hlt
jmp halt
error_msg db 'Error loading kernel!', 0
times 510-($-$$) db 0
dw 0xAA55 ; Boot signature
This code will load the second sector of the disk at 0x1000
which contains the kernel code. After loading it jumps to it.
Step 2: Create the Kernel:
Kernel.asm
:
[BITS 16] ; We are writing 16-bit real mode code
[ORG 0x1000] ; Kernel load address
start:
mov ax, 0xB800 ; Video memory segment
mov es, ax
xor di, di ; Start at the beginning of video memory
; Print message to the screen
mov si, hello_msg
print_loop:
lodsb
cmp al, 0
je halt
mov [es:di], al
inc di
mov [es:di], byte 0x07
inc di
jmp print_loop
halt:
hlt
jmp halt
hello_msg db 'Hello, Kernel!', 0
times 512 - ($-$$) db 0
; 512 bytes in size, thus completely fits in second sector.
This code just prints string on the screen.
Note: The ORG
directive (origin) is used to specify the starting address where the code should be assembled to in memory. However, when generating a raw binary file, the assembler doesn't embed this information into the resulting file. Instead, it's up to the bootloader or the loading mechanism to determine where to load the binary kernel in memory.
In the provided example, the bootloader explicitly loads the kernel into memory at address 0x1000
. So, even if you don't specify ORG 0x1000
in the kernel source, as long as you assemble it correctly and load it at the desired address, it will work fine.
Step 3: Assemble the Bootloader and Kernel
nasm -f bin -o bootloader.bin bootloader.asm
nasm -f bin -o kernel.bin kernel.asm
Step 4: Create a bootable image:
dd if=/dev/zero of=boot.img bs=512 count=2880
dd if=bootloader.bin of=boot.img bs=512 count=1 conv=notrunc
dd if=kernel.bin of=boot.img bs=512 seek=1 conv=notrunc
Step 5: Run the bootable image using an emulator
qemu-system-x86_64 -drive format=raw,file=boot.img
Example 2: If our Kernel makes up of multiple files
Example 1 is of simple single file kernel that we loaded into the memory using BIOS functions in the bootloader. But what if our kernel is made up of the multiple assembly files. We can either include the files in a single files using %include
statement of the nasm
. It would be a easy way, suppose that we have two files for our kernel namely: kernel_main.asm
and init_video.asm
.
We can either include the init_video.asm
file into the kernel_main.asm
file and in this case we just need to compile the single kernel_main.asm
file.
OR we don't include the init_video.asm
and assemble it separately. such that we end up with two binary files. In this case we need to merge them together into a single binary.
Step 1: Write Multiple Assembly Files
Bootloader would be same as of the Example 1.
File 1:kernel_main.asm
:
[BITS 16] ; We are writing 16-bit real mode code
[ORG 0x1000] ; Kernel load address (optional for clarity)
extern init_video ; Declare external function
extern print_hello ; Declare external function
[SECTION .text] ; Code section
start:
call init_video ; Call the video initialization routine
call print_hello ; Call the print routine
hlt ; Halt the CPU
times 512-($-$$) db 0 ; Fill the rest of the sector with zeros
File 2:video.asm
:
[BITS 16]
global init_video ; Make the label global for linking
global print_hello ; Make the label global for linking
[SECTION .text]
init_video:
mov ax, 0xB800 ; Video memory segment
mov es, ax
ret
print_hello:
mov si, hello_msg
xor di, di
print_loop:
lodsb
cmp al, 0
je done
mov [es:di], al
inc di
mov [es:di], byte 0x07 ; Attribute byte
inc di
jmp print_loop
done:
ret
hello_msg db 'Hello, Kernel!', 0
Note: In the context of flat binary files, the .text
and .data
section directives are typically ignored by the assembler when generating the final output. These directives are mainly used in more complex executable formats like ELF to separate code and data logically, but in a flat binary, all instructions and data are simply laid out sequentially in memory.
Step 2: Assemble the Assembly Files:
nasm -f bin -o kernel_main.o kernel_main.asm
nasm -f bin -o video.o video.asm
Step 3: Link the Object Files into a Single Binary
You can use a linker to link the object files together. However, for simplicity, in this example, we'll concatenate the object files into a single binary. This approach works because we are using a flat binary format.
cat kernel_main.o video.o > kernel.bin
Ensure that kernel_main.bin
is first, followed by video.bin
.
Note: The order of binary files in the cat
commands matters when concatenating them into a single binary file. When creating a kernel or any other multi-file binary, you need to ensure that the order of the files reflects the intended memory layout and execution flow. Typically, the main entry point (e.g., kernel_main.o
) should be placed first, followed by other object files in the order they are referenced or called.
Step 4: Create the Bootable Image:
Combine the bootloader and kernel into a single disk image.
dd if=/dev/zero of=boot.img bs=512 count=2880
dd if=bootloader.bin of=boot.img bs=512 count=1 conv=notrunc
dd if=kernel.bin of=boot.img bs=512 seek=1 conv=notrunc
Step 5: Test the Bootable Image:
Use an emulator like QEMU to test the bootable image.
qemu-system-x86_64 -drive format=raw,file=boot.img