Different Kernel Format File

OS Kernel Format

The kernel, which is the core component of an operating system, can be packaged and stored in different file formats for various purposes. The two main formats discussed here are flat binary (bin) files and ELF (Executable and Linkable Format) files. Here's a detailed comparison of having a kernel in a binary file versus an ELF file:

1 Flat Binary (Bin) File

Characteristics:

  • Format: A binary file is a raw sequence of bytes that the computer's hardware or bootloader can directly load into memory and execute.
  • Structure: Generally lacks any headers or metadata that describe the contents. It is a flat binary image, meaning it starts executing from a predefined entry point without any additional information about segments or sections.

Concept of Sections:

In a pure binary file format (commonly referred to as a "flat binary"), there is no explicit concept of sections like .text, .data, or .bss as you would find in more complex executable formats such as ELF or PE. Instead, all code and data are simply placed sequentially in the file, and it is up to the programmer to manage the memory layout.

Advantages:

  • Simplicity: The format is straightforward, consisting of raw binary data that the system can directly load and execute.
  • Bootloaders: Some bootloaders or firmware may prefer or require raw binary images because they can be simpler to handle and parse.
  • Minimal Overhead: Without additional headers or metadata, the file size can be smaller, which might be beneficial in resource-constrained environments.

Disadvantages

  • Lack of Metadata: Since binary files lack headers, there is no embedded information about the file structure, making it harder to perform certain tasks like debugging, symbol resolution, or dynamic linking.
  • Limited Flexibility: The absence of segments and section information can make it less flexible for certain advanced uses, such as dynamic loading or linking of kernel modules.

2 ELF (Executable and Linkable Format) File

Characteristics

  • Format: ELF files include a standard format with headers, program headers, and section headers that describe the file's structure and contents.
  • Structure: Contains detailed information about different sections (code, data, BSS) and segments, along with metadata such as symbol tables and relocation information.

Advantages

  • Rich Metadata: ELF files contain extensive metadata, including information about sections and segments, which aids in debugging, dynamic linking, and symbol resolution.
  • Flexibility: The ELF format supports dynamic linking and loading of shared libraries or kernel modules, providing greater flexibility and modularity.
  • Tooling Support: Many development and debugging tools are designed to work with ELF files, making it easier to develop and maintain the kernel.
  • Debugging: The inclusion of symbol tables and other debugging information makes it easier to diagnose and fix issues within the kernel.

Disadvantages

  • Complexity: The format is more complex, which might require more sophisticated bootloaders that can parse ELF headers and load the kernel properly.
  • Overhead: The additional headers and metadata can increase the file size slightly, although this is often negligible compared to the benefits provided.

 

ELF File

An ELF (Executable and Linkable Format) file is a common standard file format used for executables, object code, shared libraries, and core dumps. It is widely used in Unix and Unix-like operating systems, such as Linux.

Key Components of an ELF File

1 Header:

  • Provides metadata about the file.
  • Includes the file type (executable, shared object, etc.), target architecture, entry point address, and offsets to other parts of the file.

2 Program Header Table:

  • Describes segments used at runtime.
  • Contains information needed for memory mapping, such as segment sizes and access permissions.

3 Section Header Table:

  • Describes the sections that make up the file.
  • Each section contains different types of data, such as code, data, symbol tables, and debugging information.

4 Sections:

  • Various parts of the file, including:
    • .text: Contains executable code.
    • .data: Contains initialized data.
    • .bss: Contains uninitialized data.
    • .rodata: Contains read-only data.
    • .symtab and .strtab: Symbol table and string table for symbol names.
    • .debug: Contains debugging information.

ELF File Types

1 Executable Files:

  • These files contain code and data for programs that can be executed by the operating system.

2 Shared Libraries:

  • These files contain code and data that can be used by multiple programs simultaneously, reducing memory and disk space usage.

3 Object Files:

  • Intermediate files produced during the compilation process.
  • They are later linked to form executables or shared libraries.

4 Core Dumps:

  • These files are snapshots of a program's memory and state at a specific point, typically when a program crashes.
  • They are used for debugging purposes.

Difference between Flat Binary and ELF Format:

FeatureFlat BinaryELF (Executable and Linkable Format)
PurposeSimple, direct execution of codeAdvanced, structured format for executables, libraries, and object files
StructureSequential, raw byte layoutStructured with headers, sections, and segments
SectionsNo explicit sections; all code and data are sequentially placed.Supports multiple sections like .text, .data, .bss
MetadataMinimal to none (raw bytes)Contains detailed metadata in headers
Executable HeaderNo header, just raw instructions/dataELF header (metadata about the file)
PortabilitySpecific to a certain memory layout and architectureMore portable across different systems and architectures
LinkingTypically does not support linking of multiple filesSupports static and dynamic linking
Loading MechanismSimple: load to a specific memory addressComplex: involves parsing headers and sections
Debugging InformationNoneCan contain debugging information (e.g., symbol tables, debug sections)
RelocationNot supportedSupports relocation entries for shared libraries
Symbol ResolutionNo supportSupports symbol tables for resolving function and variable names
Size OverheadMinimal (just the code and data)Some overhead due to headers and additional sections
Error HandlingSimple error handling (e.g., BIOS or bootloader specific)More robust error handling via ELF loader
Usage ScenariosEmbedded systems, bootloaders, minimal systemsGeneral-purpose operating systems, complex applications
Tools SupportLimited to basic assembly and concatenation toolsWide range of tools (linkers, debuggers, profilers)
File Extensions.bin, .img.o, .so, .a, .elf, etc.
Executable IdentificationNot identifiable by content aloneIdentifiable by ELF magic number (0x7F 'E' 'L' 'F')

Loading the Binary Kernel

As we already understood that the binary file is plain binary format which is execution ready. Just load the binary kernel into memory and jump to its loaded location for executing its code.

Example 1: Single File Kernel:

Let's create a simple binary kernel and bootloader in assembly language, both use a plain binary format, no header things. The bootloader will load the kernel into memory and jump to its location to execute its code.

Step 1: Create the Bootloader

The bootloader will reside in the first sector of the disk (512 bytes) and will load the kernel from the second sector onward.

bootloader.asm:

[BITS 16]           ; We are writing 16-bit real mode code
[ORG 0x7C00]        ; BIOS loads the bootloader at address 0x7C00

start:
    ; Set up segment registers
    xor ax, ax
    mov ds, ax
    mov es, ax

    ; Load the second sector (first sector of the kernel) to 0x1000:0x0000
    mov ah, 0x02       ; BIOS read sectors function
    mov al, 1          ; Number of sectors to read
    mov ch, 0          ; Cylinder number
    mov cl, 2          ; Sector number (start from second sector)
    mov dh, 0          ; Head number
    mov dl, 0x80       ; Drive number (first hard disk)
    mov bx, 0x1000     ; Load address
    int 0x13           ; Call BIOS interrupt

    jc load_error      ; If carry flag is set, jump to load_error

    ; Jump to kernel entry point
    jmp 0x1000:0x0000

load_error:
    ; Print error message and halt
    mov si, error_msg
print_loop:
    lodsb
    cmp al, 0
    je halt
    mov ah, 0x0E
    int 0x10
    jmp print_loop

halt:
    hlt
    jmp halt

error_msg db 'Error loading kernel!', 0

times 510-($-$$) db 0
dw 0xAA55          ; Boot signature

This code will load the second sector of the disk at 0x1000 which contains the kernel code. After loading it jumps to it.

Step 2: Create the Kernel:

Kernel.asm:

[BITS 16]           ; We are writing 16-bit real mode code
[ORG 0x1000]        ; Kernel load address

start:
    mov ax, 0xB800  ; Video memory segment
    mov es, ax
    xor di, di      ; Start at the beginning of video memory

    ; Print message to the screen
    mov si, hello_msg
print_loop:
    lodsb
    cmp al, 0
    je halt
    mov [es:di], al
    inc di
    mov [es:di], byte 0x07
    inc di
    jmp print_loop

halt:
    hlt
    jmp halt

hello_msg db 'Hello, Kernel!', 0

times 512 - ($-$$) db 0

; 512 bytes in size, thus completely fits in second sector.

This code just prints string on the screen.

Note: The ORG directive (origin) is used to specify the starting address where the code should be assembled to in memory. However, when generating a raw binary file, the assembler doesn't embed this information into the resulting file. Instead, it's up to the bootloader or the loading mechanism to determine where to load the binary kernel in memory.

In the provided example, the bootloader explicitly loads the kernel into memory at address 0x1000. So, even if you don't specify ORG 0x1000 in the kernel source, as long as you assemble it correctly and load it at the desired address, it will work fine.

Step 3: Assemble the Bootloader and Kernel

nasm -f bin -o bootloader.bin bootloader.asm
nasm -f bin -o kernel.bin kernel.asm

Step 4: Create a bootable image:

dd if=/dev/zero of=boot.img bs=512 count=2880
dd if=bootloader.bin of=boot.img bs=512 count=1 conv=notrunc
dd if=kernel.bin of=boot.img bs=512 seek=1 conv=notrunc

Step 5: Run the bootable image using an emulator

qemu-system-x86_64 -drive format=raw,file=boot.img

Example 2: If our Kernel makes up of multiple files

Example 1 is of simple single file kernel that we loaded into the memory using BIOS functions in the bootloader. But what if our kernel is made up of the multiple assembly files. We can either include the files in a single files using %include statement of the nasm. It would be a easy way, suppose that we have two files for our kernel namely: kernel_main.asm and init_video.asm.

We can either include the init_video.asm file into the kernel_main.asm file and in this case we just need to compile the single kernel_main.asm file.

OR we don't include the init_video.asm and assemble it separately. such that we end up with two binary files. In this case we need to merge them together into a single binary.

Step 1: Write Multiple Assembly Files

Bootloader would be same as of the Example 1.

File 1:kernel_main.asm:

[BITS 16]              ; We are writing 16-bit real mode code
[ORG 0x1000]           ; Kernel load address (optional for clarity)

extern init_video      ; Declare external function
extern print_hello     ; Declare external function

[SECTION .text]        ; Code section

start:
    call init_video    ; Call the video initialization routine
    call print_hello   ; Call the print routine
    hlt                ; Halt the CPU

times 512-($-$$) db 0  ; Fill the rest of the sector with zeros

File 2:video.asm:

[BITS 16]

global init_video         ; Make the label global for linking
global print_hello        ; Make the label global for linking

[SECTION .text]

init_video:
    mov ax, 0xB800         ; Video memory segment
    mov es, ax
    ret

print_hello:
    mov si, hello_msg
    xor di, di
print_loop:
    lodsb
    cmp al, 0
    je done
    mov [es:di], al
    inc di
    mov [es:di], byte 0x07 ; Attribute byte
    inc di
    jmp print_loop
done:
    ret

hello_msg db 'Hello, Kernel!', 0

Note: In the context of flat binary files, the .text and .data section directives are typically ignored by the assembler when generating the final output. These directives are mainly used in more complex executable formats like ELF to separate code and data logically, but in a flat binary, all instructions and data are simply laid out sequentially in memory.

Step 2: Assemble the Assembly Files:

nasm -f bin -o kernel_main.o kernel_main.asm
nasm -f bin -o video.o video.asm

Step 3: Link the Object Files into a Single Binary

You can use a linker to link the object files together. However, for simplicity, in this example, we'll concatenate the object files into a single binary. This approach works because we are using a flat binary format.

cat kernel_main.o video.o > kernel.bin

Ensure that kernel_main.bin is first, followed by video.bin.

Note: The order of binary files in the cat commands matters when concatenating them into a single binary file. When creating a kernel or any other multi-file binary, you need to ensure that the order of the files reflects the intended memory layout and execution flow. Typically, the main entry point (e.g., kernel_main.o) should be placed first, followed by other object files in the order they are referenced or called.

Step 4: Create the Bootable Image:

Combine the bootloader and kernel into a single disk image.

dd if=/dev/zero of=boot.img bs=512 count=2880
dd if=bootloader.bin of=boot.img bs=512 count=1 conv=notrunc
dd if=kernel.bin of=boot.img bs=512 seek=1 conv=notrunc

Step 5: Test the Bootable Image:

Use an emulator like QEMU to test the bootable image.

qemu-system-x86_64 -drive format=raw,file=boot.img