PCAP parsing evaluation - C

20 Aug 2024

This is a small experiment that implements and evaluates a pcap parser. The language used in this post is C. All the parsing is done on a sample dump shown below.

The rules I made up for this evaluation are simple:

The dump file that's beeing used in this evaluation contains an ICMP echo request and the reply to that.

20:21:45.590918 wlp1s0 Out IP (tos 0x0, ttl 64, id 63347, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.1.21 > 9.9.9.9: ICMP echo request, id 5, seq 1, length 64
        0x0000:  4500 0054 f773 4000 4001 be60 c0a8 b21a  E..T.s@.@..`....
        0x0010:  0909 0909 0800 f50f 0005 0001 b9ac 6066  ..............`f
        0x0020:  0000 0000 2104 0900 0000 0000 1011 1213  ....!...........
        0x0030:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223  .............!"#
        0x0040:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233  $%&'()*+,-./0123
        0x0050:  3435 3637                                4567
20:21:45.608034 wlp1s0 In  IP (tos 0x0, ttl 59, id 19032, offset 0, flags [none], proto ICMP (1), length 84)
    9.9.9.9 > 192.168.1.21: ICMP echo reply, id 5, seq 1, length 64
        0x0000:  4500 0054 4a58 0000 3b01 b07c 0909 0909  E..TJX..;..|....
        0x0010:  c0a8 b21a 0000 fd0f 0005 0001 b9ac 6066  ..............`f
        0x0020:  0000 0000 2104 0900 0000 0000 1011 1213  ....!...........
        0x0030:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223  .............!"#
        0x0040:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233  $%&'()*+,-./0123
        0x0050:  3435 3637                                4567

As the source code for the C program got quite long (~300 sloc), I decided to publish it on a separate page. On this page, only some excerpts are discussed. The complete code can be found here.

In C, the first question that arises (when dealing with binary data) is whether it's okay to misuse the memory representation of an aligned struct for parsing purposes. With pragma pack, the alignment between variables inside a struct could match the byte sizes inside a pcap packet, and thus a parser could easily be implemented. I decided not to do this and assigned each field of the protocols individually. Parsing a packet now works like this:

int parse_pcap_header(pcap_header *pkg_struct, FILE *fp) {
  fread(&(*pkg_struct).magic, 4, 1, fp);
  fread(&(*pkg_struct).major_version, 2, 1, fp);
  ...

This is more code, but apparently it's best practice, otherwise you'd be making assumptions about the ABI.

The next thing to implement was to parse the data from the captured packets. Here, memcpy had to be used instead of reading directly from the stream, as this was a more convenient way of managing the bytes. Each function used to parse different protocol layers of the packet additionally counted the parsed bytes. This was to ensure the correct number of bytes had already been parsed. If this number exceeded the specified capture length, the program would terminate with an error.

bytes_read = parse_icmp_echo_reply(&pkg_icmp_echo, packet);
bytes_parsed += bytes_read;                                
if (bytes_parsed < pkg_record.cap_len)                     
  packet = &packet[bytes_read];                            
else                                                       
  exit(EXIT_FAILURE);                                      
print_icmp4(&pkg_icmp_echo);                               
...

Concluding: It was interesting to read about the artificial link layer that tcpdump uses when capturing on an "all" interface. It's also good to know about that the timestamps passed in an icmp message are used to calculate roundtrip times, because the payload has to be sent back unchanged in an echo message. Every time I write something in C, I have the feeling that I have made some mistakes by sticking to old coding standards. I guess the only way around this is reading more C code. The complexity of writing this parser from scratch was okay. In general, it was a good practice to be able to estimate the time needed to complete other projects in C.