Yara rules - Anti-malware support

29 Oct 2023

When searching for adversaries on a system or network, yara rules are used to detect IOCs of malware and other bad actions.

These rules are work like simple regular expressions, except that they are supplied with more metadata about the details of what was found. A rule consists of a strings section, where variables are assigned to text in ASCII or hexadecimal form, and a condition section, where you define when the rule applies and under what conditions. A simple rule taken from the developer manual looks like this:

rule ExampleRule
{
  strings:
    $my_text_string = "text here"
    $my_hex_string = { E2 34 A1 C8 23 FB }

  condition:
    $my_text_string or $my_hex_string
}

These rules can assist when searching for malware families. Assuming we have different malware samples of the same family at hand. With a debugger it is possible to search for matching IOCs across these samples, that are also unique to this kind of program. I downloaded and compiled five versions of GNU's grep from the official site and assume that these are the binary samples of an infamous dangerous malware family.

To create rules for this programm a binary analysis of the assembler code must be made. This can be acomplished with the GNU debugger (gdb). Also a textual analysis of the printable strings is done, to find similarities across the binaries. The publish date of the binary ranges from 1998 to 2023. This wide time range is unusual for malware. If you came across a few samples with this kind of history, it's probably better to create a new rule. To adapt to anti-malware techniques much of the initial code of it would've been changed anyway over time. Through the strings and gdb command the following similarities could be found inside the different versions:

rule grep
{
  strings:
    $1 = "bug-gnu-utils"
    // 2.3 + 2.5.1

    $2 = "bug-grep"
    // 2.6.2 + 3.6 + 3.11

    $3 = "http.://www.gnu.org/software/grep/"
    // 2.6.2 + 3.6 + 3.11

    $4 = "Written by Mike Haertel and others; see"
    // 3.6 + 3.11

    $5 = {83 3d ?? ?? ?? 00 00 0f 85 ?? ?? 00 00 48 83 7c 24 ?8 00 0f (84|85) ?? ??}
    // 2.3 + 2.5.1 + 2.6.2 (show_help)
    // cmpl         $0x0,0x??????(%rip)
    // jne          ????
    // cmpq         $0x0,0x?8(%rsp)
    // (je|jne)     ????

    $6 = {8b 05 ?? ?? ?? 00 85 c0 0f 85 ?? ?? 00 00}
    // 3.6 + 3.11 (show_help)
    // mov    0x??????(%rip),%eax
    // test   %eax,%eax
    // jne    ???

  condition:
    $1 or $2 or $3 or $4 or $5 or $6
}

If grep is executed, the general behaviour is to output a short help message and quit. The assembly code around showing the help message hasn't changed much, so it's ideal to use it inside a rule. Just like the strings, the assembly instructions can be directly assigned to a variable. The hex numbers used can also be found in the binary files of the files.

Because of the unspecific condition, this rule can not be used for detecting the listed grep versions with much confidence. The reason is because the chosen assembly instructions are not that unique to grep as expected. They appear in a whole lot other binaries on a linux system. To mitigate this error, the condition can be modified like so:

condition:
    1 of ($1,$2,$3,$4) and ($5 or $6)

The condition given makes sure that the likelihood of detecting false positives is low. With yara several more advanced condition features can be used, like locations, the count of found strings, and more.

Further reading: yara developer manual, yara rule of the neshta malware family.