Instruct and inspect

Instructions, registers, and system calls in x86 assembly

September 23, 2024

I'm on a journey this week to understand OS processes. The process is a fundamental abstraction the OS provides to users. On a basic level, it is simply a running program. In modern operating systems, the CPU must be able to run a multitude of processes at once. It does this through a process called virtualization, where the CPU grants each process its own memory address space. This is basically a small area of memory where a program can run. The OS provides its own API to create, run, and destroy processes. It handles running many programs concurrently, managing memory and resources under the hood. Before we get into how we can leverage the process API, I want to understand how a program is executed on a machine.

Let's start simple:

Here's a really basic program written in assembly:

section .text

global _start

_start:
	MOV eax,1
	MOV ebx,0
	INT 80h

This program doesn't really do much—it runs and then immediately exits—but that's okay. Let's break it down.

We begin with a couple of directives that tell the assembler (i.e., a program that converts assembly into machine code) how the program should execute. section .text tells the assembler that everything below it contains actual CPU instructions. The global _start directive tells the assembler where to start running the instructions.

Now for the main course: the instructions. These are commands that typically correlate to actual machine code operations. The MOV instruction directs the CPU to move data from one location to another.

MOV destination, source

In the program above, we move 0x1 and 0x0 into eax and ebx, respectively. What are these weird letters? Maybe variables? No, they are registers, which are used for storing small amounts of data within the CPU for quick retrieval. eax and ebx are specific to x86 architecture. I won't go into much further detail on them for now, but they are important!

INT 80h is an interesting one. The INT instruction is an event that alters the normal execution flow of the program. In this case, 80h (or 0x80 in its hexadecimal representation) tells the CPU to invoke a system call. I'll go into more detail on system calls in a future post, but in our program, once the interrupt is invoked, we transition from user space to kernel space. The kernel goes, "Hey, I'm here… let's see what I should do." It determines the type of system call by looking at the value in the eax register. The arguments for each call are stored in different registers, but the system call in this program only has one argument, which lives in the ebx register.

Here's a really nice visualization of this.

Because we have moved 0x1 into eax, the system call is exit. The value of ebx is the exit code.

Let's try actually running this program. First we've got to assemble the program to an object file nasm:

nasm -f elf32 program.asm -o program.o

Now we've got to link the object file and create the executable (more on linking later....):

ld -m elf_i386 -o my_program program.o

We can now run the executable:

./my_program

But nothing seems to happen. The program just runs and exists. Well.. that's basically what we've told the OS to do. One last thing; let's see the exit code of the last executed program with

echo $?

This will return 0, which is the exact value we placed in the ebx register!

Ok, but this isn't very exciting. Let's go a layer deeper and actually look at the machine code!

Inspecting machine code with objdump

We can actually see the binary / hex representation of our program above with objdump.

First we assemble the program, like we did above:

nasm -f elf32 program.asm -o program.o

We can look at the actual machine instructions next to their assembly counterpoints by disassembling the object file:

objdump -d program.o

We can ignore the first few lines for now. Let's just look at this part for now:

00000000 <_start>:
   0:	b8 01 00 00 00       	mov    $0x1,%eax
   5:	bb 00 00 00 00       	mov    $0x0,%ebx
   a:	cd 80                	int    $0x80

In the left column we see that our program starts at the 00000000 memory address space. The b8 code is a specific instruction to move a preceding 32 bit value into the eax register. bb, does the same, just for the ebx register.

But something is odd here...

01 00 00 00

This doesn't look like 0x1, which is the value we want for our exit system call. It turns out this is just a different representation of 0x1. x86 uses something called Little Endian Byte Order, where the most significant bytes come first. This is an aside, though. Each of the first two instructions is 5 bytes in length. When we reach the 11th byte (i.e., a in hex), we come to the interrupt command, translated here to the cd instruction.

Unanswered Questions

I began this post talking about processes and memory addresses. But we have yet to see how the elementary interrupt program exists within the OS ecosystem. How does the kernel actually run this program? How does it translate from just bytes of text in memory in a program.asm file to a process?