In the next of these small explorations into x86 system calls I want to rewrite a popular unix command cat
in assembly. This will be a simplified version of the command. It won't have any flags, just a single argument for the file name. Here's what's in store:
So looks like this will involve five system calls, which as you might have guessed are open
, read
, write
, close
and exit
.
To start we'll open a file by making a system call. All we need for this system call is a pointer to a file name we want to open and to specify some options:
_start:
mov eax, 5 ; 0x05 is the system call number for open
mov ebx, filename ; defined in .data section above
mov ecx, 0; for readonly we just set this to 0
int 80h;
call exit ; implemented elsewhere
After the open system calls we look at the registers and see the following (I use gdb
for this - see previous posts):
eax 0x3 3
ecx 0x0 0
edx 0x0 0
ebx 0x804a000 134520832
Looks like all our registers have remained the same except one. eax
has a new value. This register is where the kernel typically places return values from system calls. In the man 2 open
page we see that the return value is something called a file descriptor (fd):
The return value of open() is a file descriptor, a small, nonnegative integer that is used in subsequent system calls...to refer to the open file. The file descriptor returned by a successful call will be the lowest-numbered file descriptor not currently open for the process.
Every process keeps a record of each file it has open. It is stored in something called the file descriptor table. You can get a sense of what this looks like by going to the directory of a process:
$ ls /proc/[pid]/fd
But that's not all what happens:
A call to open() creates a new open file description, an entry in the system-wide table of open files.
The OS also maintains a global list of all files it has open. We don't need to get too into this at the moment, but good to be aware of!
Let's pause for a moment and ask ourselves: what is a file? Most of the time when we're interacting in desktop environments we interact with finite items that contain things like text, images, executables. However, a file is just a collection of bytes organized in a manner specific to a program it is intended for. And these files don't have to be fixed in size, they can also be streams of data or continuous flows of bytes.
In UNIX environments everything is a file, including streams like the output of a keyboard, a networking socket, and anonymous pipes.
With this knowledge we can turn back to our original question about the starting fd
value for our newly opened file.
We open a file with our system call above, and get a new file descriptor. The process we are running creates a new entry in both the process-specific file descriptor table and the global file table.
But why does our new file have the fd
of 3? We can get a hint from my previous investigation where we performed a write
system call. This call takes three arguments- the first of which is an unsigned integer. From man 2 write
:
write(int fd, const void *buf, size_t count);`
The first argument looks like a file descriptor and we used the integer 1
to write to stdout
. Let's look at the man pages for stdout
From man 3 stdout
:
Under normal circumstances every UNIX program has three streams opened for it when it starts up, one for input, one for output, and one for printing diagnostic or error messages
Every time a process begins it has three file descriptors open by default: standard input (0
), standard output (1
) and standard error (2
). So, when we open a new file from our process we increment upwards and get the fd
of 3
.
Now that our file is open, we want to actually read from it and load it into memory somehow. We've barely scratched the surface of the file system API though, especially in regards to how permissions are set on each file and what happens when more than one process is reading or writing to the same file. We'll get there though!