OpenBSD src - cat.c

03 Aug 2023

This post is part of a series of source code readings. Today, some highlights of OpenBSD's cat is presented.

Cat is used to concatenate files. Since it's main goal is reading files, there should be a good implementation of it. With the use of optional flags the output and behaviour can be moderately customized. A line count feature awaits and also the printing of whitespace characters is possible. This could be used to quickly analyze a small text based file.

Overall, five functions are responsible for the execution of this program. One is doing the raw concatenating, which will be looked at more in depth. Other functions handle the parsing of parameters and flags passed to cat.

The function raw_cat defines variables for accessing the standard output buffer and storing information about it. This information is then used to define a fixed buffer size and later also to determine the number of bytes to read from the file descriptor that was passed as a parameter to the function. After that, the function enters a rather simple read-file-print-file loop construct:

while ((nr = read(rfd, buf, bsize)) != -1 && nr != 0)
  for (off = 0; nr; nr -= nw, off += nw)
    if ((nw = write(wfd, buf + off, (size_t)nr)) == 0 || nw == -1)
      err(1, "stdout");

The same code but in a more verbose version, makes describing it more easier:

nr = read(rfd, buf, bsize);
while (nr != -1 && nr != 0) {
  for (off = 0; nr; nr -= nw, off += nw) {
    nw = write(wfd, buf + off, (size_t)nr);
    if (nw == 0 || nw == -1) {
      err(1, "stdout");
    }
  }
  nr = read(rfd, buf, bsize);
}

First nr is set with bsize bytes out of the file that rfd links to. If everything went well, the while loop is entered. The for loop is now responsible to write from the buffer to the stdout stream. The variable off is used to specify an offset for what was already printed to stdout. For every run inside the for loop, nr is reduced by the size of nw (already printed bytes). If nr reaches 0 the next bytes is transfered in the buffer. This goes on until EOF is reached. After that the program exits.

I came for a great and simple implementation of reading files, but I got surprised with a very nice and elegant solution of buffered output.