OpenBSD src - kern_exit.c

03 Oct 2024

This is a series of posts where I try to learn more about the C source code of the OpenBSD operating system. This one is an explanation of the exit syscall. Rather than describing each line of the implementation, the posts usually contain an introduction to the topic, one or two highlights, and a conclusion.

The code of kern_exit resides in the directory /sys/kern/. Since it's a bit special in how syscalls are included in the kernel and how they are called, I'll make another post describing the process of actually generating, building and calling syscalls in OpenBSD. For now, let's assume that the initial call is beeing made with the prototype void sys_exit(int rval);. Speaking of calls, the simplified call graph of the exit syscall looks something like this:

sys_exit |-> exit1
         |-(traced?)-> process_untrace
         |-(loop children)-> process_reparent
         |-(loop orphans)---> process_clear_orphan
         |-------------------> process_reparent
         |(------------------)> exit2
                                |---> reaper
                                      |-(thread?)-> proc_free
                                      |-(!zombie?)-> process_zap

For readability reasons the call graph shows only locally defined functions (inside kern_exit.c) and only a small fraction of them. A connector in braces indicates an indirect call (exit2 is not called directly). First, the function sys_exit acts as a wrapper to pass arguments to the function exit1. Then exit1: ...

checks if the process to kill has multiple threads or one,
sets the exit status,
updates the runtime information,
collects ressource stats for all threads and releases the memory back to the corresponding zombie pool,
closes open files,
kills the controlled terminal session of the process,
removes all unveil promises of the process,
removes the process from the list of all threads and the hash list,
marks the PID as free to use, reparents children (either to the original parent or to init),
adds the ressource usage to the process total and clears cpu utilization,
wakes up parent if it was waiting and reparent the process to init,
waits until the dead process vmspace and stack isn't used anymore,
and finally calls exit2.

The exit2 function then adjusts the runtime information for the remainder of exit1's execution, locks the dead process with a special spin-lock mechanism, and prepares to call the reaper function by a kernel thread. The reaper function is then responsible for freeing the remaining vmspace, calling proc_free (if the process was just a thread), sending SIGCHLD to the parent to wake it up (if the process is marked as a zombie), otherwise it'll call process_zap which will finally kill the process.

Since running this syscall would mostly just mean calling a bunch of other functions from the kernel, it was hard to find anything like a highlight. Nevertheless, the first interesting piece of code I came across was SCARG. It's a macro used to access the parameters of the syscall. Although it's not clear to me, why this structure and way of accessing it is necessary (maybe until the next post, where the topic of calling will be covered in more detail). The relevant source code is:

// @OpenBSD:/sys/kern/kern_exit.c
sys_exit(struct proc *p, void *v, register_t *retval)
{
  struct sys_exit_args /* {
    syscallarg(int) rval;
  } */ *uap = v;

exit1(p, SCARG(uap, rval), 0, EXIT_NORMAL);

// @OpenBSD:/sys/sys/systm.h
#elif _BYTE_ORDER == _LITTLE_ENDIAN
#define SCARG(p, k) ((p)->k.le.datum) /* get arg from args pointer */

// @OpenBSD:/sys/sys/syscallargs.h
#define syscallarg(x)             \
  union {               \
    register_t pad;           \
    struct { x datum; } le;         \
    struct {            \
      int8_t pad[ (sizeof (register_t) < sizeof (x))  \
        ? 0         \
        : sizeof (register_t) - sizeof (x)];  \
      x datum;          \
    } be;             \
  }

// @OpenBSD:/sys/sys/types.h
typedef __register_t register_t; /* register-sized type */
// @OpenBSD:/sys/arch/*/include/_types.h
typedef long __register_t;

In the code snippets above, uap is used as a pointer to a struct which has a union (syscallarg(int)) as a single member. The sys_exit_args structure is defined so that the union has a datum field called rval. This field is later accessed by the SCARG macro, which accesses the predefined datum in the union syscallarg. For little-endian byte order, this simply returns the integer value of the inner struct. The interesting part is that for big-endian byte order it also accumulates to fill the register (if needed) with the pad array. I definitely like the way pseudo dynamic typing and variadic parameters are implemented here by syscallarg.

As shown, a lot is happening inside the kernel when a process is killed. Even more than is written in the man page or anywhere else. So I think it's a good idea to take a look at the source code, if you want to know what's going on. I really recommend having a look at it. The way the code is written makes it really easy to read and understand. Much is left uncovered in this post, due to complexity and other limiting reasons. Also, some of the exit syscall functions have not been shown here to reduce the length of this post.