HTB FFModule

FFModule uses malware techniques to obfuscate itself, injects its code into Firefox, and hooks one of its internal routines. While analyzing malware is a difficult task, there are dynamic analysis frameworks such as Sharem that can help a lot. But since I don't like Windows, I decided to solve this one without running the malware in a Windows environment, relying only on unreliable Linux emulators. This made the task harder but also definitely more interesting.


Kaddate |

Difficulty : Medium
Category : Reversing


First look

At first inspection, the program is a classic PE executable for 64bit.

~ $ file ffmodule.exe 
ffmodule.exe: PE32+ executable for MS Windows 6.00 (console), x86-64, 6 sections

Opening the program into Ghidra shows the first shady thing :

The main function uses a xor obfuscation to hide a part of the binary. Let's retrieve it using Qiling :

from qiling import Qiling
from qiling.const import QL_ARCH, QL_OS, QL_VERBOSE
from unicorn.unicorn_const import UC_MEM_READ_UNMAPPED

SHELLCODE_ADDR = 0x140017000 
SHELLCODE_SIZE = 1444
SHELLCODE_DECRYPTED_ADDR = 0x1400011aa

def hook_extractShellcode(ql: Qiling) -> None:
    SHELLCODE = bytes(ql.mem.read(SHELLCODE_ADDR, SHELLCODE_SIZE))
    open("./shellcode.bin", "wb").write(SHELLCODE)
    ql.log.log(QL_VERBOSE.DISABLED, f"shellcode extracted")

if __name__ == "__main__":
    ql = Qiling(["./rootfs/x8664_windows/ffmodule.exe"], rootfs="./rootfs/x8664_windows/", verbose=QL_VERBOSE.DISABLED)
    ql.hook_address(hook_extractShellcode, SHELLCODE_DECRYPTED_ADDR)
    ql.run()

Process Injection

Now that we have "deobfuscated" this payload, let's dive into the second function of the malware :

The decompiled view is straight forward, the program seems to walk through Windows's processes looking for "firefox.exe" and then injects in it the previously unravelled payload. At this point, we'll call this payload the shellcode.

The Shellcode

The first step : PEB Walking and GetProcAddress

Ok, so this is where it gets more difficult. Let's look at the first function of the shellcode, the most important and the most useful to understand:

The preamble of the function gets the register GS at offset 0x30, and then performs a lot of dereferences on it.

  1. GS[0x30] in 64-bit is a global pointer to the TEB (Thread Environment Block) which contains all the information about the current thread.
  2. RAX + 0x60 gets the PPEB (Process Environment Block) which contains all the information about the current process.
  3. RAX + 0x18 gets the PPEB_LDR_DATA (The loader data). which contains all the information populated by the loader. All information about the loaded modules is in there.
  4. RAX + 0x20 gets the field InMemoryOrderModuleList, a doubly linked list that points to all modules by load order. This list is stored in a bigger struct called LDR_DATA_TABLE_ENTRY which contains the information about the current module in the list.
  5. RAX, qword ptr [RAX] this instruction essentially reads the first field of the linked-list node (the Flink field). So this yields the next module in the linked list.
  6. RAX, qword ptr [RAX] same instruction, So after this the register RAX holds the pointer to the third library loaded in memory.
  7. RAX + 0x20 This one took me a while to understand. At this point RAX is treated as a LDR_DATA_TABLE_ENTRY struct. The field InMemoryOrderLinks is at offset 0x10 of this struct; adding 0x20 points to the offset 0x30 of the LDR_DATA_TABLE_ENTRY, which is the DllBase field.

In summary, this first step of the function sets R8 to the DllBase address of the third module loaded in firefox's memory. Let's continue:

  1. R8 + 0x3C the DllBase in R8 points the base address of the DLL in memory. That base can be casted into the an IMAGE_DOS_HEADER. The offset at 0x3C points to the e_lfanew field, which is the offset to the IMAGE_NT_HEADER containing the headers for modern PE files.
  2. RAX + 0x88 is basically : RDX = (IMAGE_DATA_DIRECTORY *)(RAX->OptionalHeader->DataDirectory[IMAGE_EXPORT_DIRECTORY]). the offset 0x88 will fetch the section of the DLL which contains all the exported functions.
  3. RDX + 0x20 finally, this reads the AddressOfNames field, which is an array of RVAs (pointers) to the ASCII names of the exported functions.

This method of locating libraries and functions is called PEB walking: we walk through the PEB and related loader structures to retrieve the information the malware needs.

Without going into too much detail, the next step of the function iterates over all the function names in the third DLL (as described above), hashes each name character-by-character, and checks whether the result matches a hardcoded CRC32 value in the malware.

To verify which function is retrieved, let's first check what's the third module loaded by Firefox.

~ $ WINEPATH=./firefox/x8664_windows/Windows/System32 WINEDEBUG=+loaddll wine ./firefox/firefox.exe
002c:trace:loaddll:build_module Loaded L"C:\\windows\\system32\\wineboot.exe" at 0000000140000000: builtin
002c:trace:loaddll:build_module Loaded L"C:\\windows\\system32\\kernelbase.dll" at 00006FFFFF3C0000: builtin
002c:trace:loaddll:build_module Loaded L"C:\\windows\\system32\\kernel32.dll" at 00006FFFFFA10000: builtin

Let's assume it's kernel32.dll, now I want to retrieve all the exports of the DLL, let's do it in python.

import pefile
import sys

if __name__ == "__main__":
    pe = pefile.PE(sys.argv[1])
    for exp in pe.DIRECTORY_ENTRY_EXPORT.symbols:
        print(exp.name.decode())
~ $ python3 ./dump_pe_exports.py ./firefox/x8664_windows/Windows/System32/kernel32.dll > kernel32_exports

And finally, we need to reimplement the "hashing" algorithm of the malware, to be more coupled with the original implementation, I will use C intrinsecs.

#include <dirent.h>
#include <errno.h>
#include <nmmintrin.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

uint32_t calculate_module_crc(const char *module) {
  uint32_t crc = 0xffffffffu;
  while (*module) {
    crc = _mm_crc32_u8(crc, (unsigned char)*module);
    module++;
  }
  return crc;
}

uint32_t parse_u32(const char *s) {
  if (!s)
    return 0;
  errno = 0;
  char *end = NULL;

  unsigned long uv = strtoul(s, &end, 0);
  if (errno) {
    perror("strtoul");
    return 0;
  }
  return (uint32_t)uv;
}

static char *trim_newline_and_space(char *s) {
  if (!s)
    return s;
  // trim newline and trailing CR/LF and spaces/tabs
  size_t len = strlen(s);
  while (len > 0) {
    char c = s[len - 1];
    if (c == '\n' || c == '\r' || c == ' ' || c == '\t') {
      s[len - 1] = '\0';
      len--;
    } else
      break;
  }
  // also skip leading spaces/tabs
  char *start = s;
  while (*start == ' ' || *start == '\t')
    start++;
  if (start != s)
    memmove(s, start, strlen(start) + 1);
  return s;
}

int main(int argc, char **argv) {
  if (argc != 3) {
    fprintf(stderr, "need 2 arguments\n");
    fprintf(stderr, "usage : %s <modules_file> <validators_file>\n", argv[0]);
    return EXIT_FAILURE;
  }

  const char *modules_file = argv[1];
  const char *validators_file = argv[2];

  /* --- Read validators file into dynamic array --- */
  FILE *vf = fopen(validators_file, "r");
  if (!vf) {
    fprintf(stderr, "Could not open validators file '%s': %s\n",
            validators_file, strerror(errno));
    return EXIT_FAILURE;
  }

  size_t validators_cap = 32;
  size_t validators_count = 0;
  uint32_t *validators = malloc(validators_cap * sizeof(uint32_t));
  if (!validators) {
    perror("malloc");
    fclose(vf);
    return EXIT_FAILURE;
  }

  char line[512];
  while (fgets(line, sizeof(line), vf)) {
    trim_newline_and_space(line);
    if (line[0] == '\0')
      continue; // skip empty lines
    uint32_t v = parse_u32(line);
    if (validators_count >= validators_cap) {
      size_t newcap = validators_cap * 2;
      uint32_t *tmp = realloc(validators, newcap * sizeof(uint32_t));
      if (!tmp) {
        perror("realloc");
        free(validators);
        fclose(vf);
        return EXIT_FAILURE;
      }
      validators = tmp;
      validators_cap = newcap;
    }
    validators[validators_count++] = v;
  }
  fclose(vf);

  if (validators_count == 0) {
    fprintf(stderr, "No validators loaded from %s\n", validators_file);
    free(validators);
    return EXIT_FAILURE;
  }

  /* --- Process modules file and check against validators --- */
  FILE *mf = fopen(modules_file, "r");
  if (!mf) {
    fprintf(stderr, "Could not open modules file '%s': %s\n", modules_file,
            strerror(errno));
    free(validators);
    return EXIT_FAILURE;
  }

  while (fgets(line, sizeof(line), mf)) {
    trim_newline_and_space(line);
    if (line[0] == '\0')
      continue;
    uint32_t crc = calculate_module_crc(line);

    bool matched = false;
    for (size_t i = 0; i < validators_count; ++i) {
      if (crc == validators[i]) {
        printf("MATCH: %s -> 0x%08x matches validator 0x%08x (index %zu)\n",
               line, crc, validators[i], i);
        matched = true;
      }
    }
  }

  fclose(mf);
  free(validators);
  return EXIT_SUCCESS;
}
~ $ ./calculate_functions_hashes ./kernel32_exports ./validators
MATCH: GetProcAddress -> 0xbc553b82 matches validator 0xbc553b82 (index 1)

And we got our function ! Then the function set the address of GetProcAddress to the register R12 and returns.

The second step : PEB Walking and VirtualAlloc

The second step uses the same PEB walking technic to retrieve the kernel32.DllBase into R14, it then calls a function a tiny bit lower in the .text section.

We can see in the disassembler that the first instruction of this function is POP RDX, which alters the stack layout and retrieves the return address. As shown in the disassembler, the malware has stored a string at that address. The malware therefore uses stack obfuscation to store strings; this pattern is repeated multiple times throughout the sample, so I won't mention it again.

This function calls GetProcAddress(Kernel32.DllBase, "VirtualAlloc"), then VirtualAlloc(...) to allocate a custom buffer in memory, the pointer to this buffer is stored on the stack. It then calls the next function.

The third step : nss3.PR_Write and VirtualProtect

This function calls GetProcAddress(Kernel32.DllBase, "VirtualProtect") and puts its result on the stack. Then the rest of the function is yet again some PEB walking and "hashing" method to check for some harcoded CRC32.

While the algorithm is the same, we can see that the offset in InMemoryOrderModuleList is not the same, this time we have 0x50 which retrieves the InMemoryOrderModuleList->BaseDllName->buff (Simply the name of the DLL selected). So the malware is looking for a specific DLL.

To retrieve the DLL the malware is looking, I'll have the same approach as before. First, let's get all Firefox's and Windows's DLLs :

import os
import sys

files = []
for dirpath in sys.argv[1:]:
    for root, dirs, filenames in os.walk(dirpath):
        for name in filenames:
            if name.lower().endswith(".dll"):
                #files.append(os.path.basename(name))
                files.append(name)

for f in files:
    print(f)
~ $ python3 ./get_all_basedllNames.py ./firefox/ ./firefox/x8664_windows/Windows/System32 |sort -u > firefox_all_dllBaseNames

Then, let's make a second C program to handle DLL names;

int main(int argc, char **argv) {
  if (argc != 3) {
    fprintf(stderr, "need 2 arguments\n");
    fprintf(stderr, "usage : %s <modules_file> <validators_file>\n", argv[0]);
    return EXIT_FAILURE;
  }

  const char *modules_file = argv[1];
  const char *validators_file = argv[2];

  [...SNIP...]

  /* --- Process modules file and check against validators --- */
  FILE *mf = fopen(modules_file, "r");
  if (!mf) {
    fprintf(stderr, "Could not open modules file '%s': %s\n", modules_file,
            strerror(errno));
    free(validators);
    return EXIT_FAILURE;
  }

  while (fgets(line, sizeof(line), mf)) {
    trim_newline_and_space(line);
    if (line[0] == '\0')
      continue;

    char lineLower[strlen(line) + 1];
    strncpy(lineLower, line, strlen(line)+1);
    for (int i = 0; i < strlen(line) - 4; i++)
      lineLower[i] = tolower(line[i]);
    uint32_t crcLower = calculate_module_crc(lineLower);

    char lineUpper[strlen(line) + 1];
    strncpy(lineUpper, line, strlen(line)+1);
    for (int i = 0; i < strlen(line) - 4; i++)
      lineUpper[i] = toupper(line[i]);
    uint32_t crcUpper = calculate_module_crc(lineUpper);

    char lineFullUpper[strlen(line) + 1];
    strncpy(lineFullUpper, line, strlen(line)+1);
    for (int i = 0; line[i]; i++)
      lineFullUpper[i] = toupper(line[i]);
    uint32_t crcFullUpper = calculate_module_crc(lineFullUpper);

    bool matched = false;
    for (size_t i = 0; i < validators_count; ++i) {
      uint32_t v = validators[i];

      if (v == crcLower) {
        printf("MATCH: %s -> 0x%08x matches validator 0x%08x (index %zu)\n",
               lineLower, (unsigned)crcLower, (unsigned)v, i);
      }
      if (v == crcUpper) {
        printf("MATCH: %s -> 0x%08x matches validator 0x%08x (index %zu)\n",
               lineUpper, (unsigned)crcUpper, (unsigned)v, i);
      }
      if (v == crcFullUpper) {
        printf("MATCH: %s -> 0x%08x matches validator 0x%08x (index %zu)\n",
               lineFullUpper, (unsigned)crcFullUpper, (unsigned)v, i);
      }
    }
  }

  fclose(mf);
  free(validators);
  return EXIT_SUCCESS;
}
~ $ gcc -o calculate_module_hashes calculate_module_hashes.c -msse4.2
~ $ ./calculate_module_hashes ./firefox_all_dllBaseNames ./validators 
MATCH: nss3.dll -> 0x779e27f4 matches validator 0x779e27f4 (index 0)
MATCH: NSS3.dll -> 0xc231d75d matches validator 0xc231d75d (index 1)
MATCH: NSS3.DLL -> 0x7c434195 matches validator 0x7c434195 (index 2)

And we got our DLL ! The function then proceeds to set nss3.DllBase into R14. Let's continue with the following function.

This function calls GetProcAddress(nss3.DllBase, "PR_Write") and copies its content into the prevously allocated buffer. PR_Write is an internal function of Firefox that handle IO to files and sockets. The function also calls VirtualProtect(AllocatedBuffer, PAGE_EXECUTE_READ) and VirtualProtect(pPR_Write, PAGE_EXECUTE_READWRITE).

The fourth step : Patching the nss3.PR_Write routine

The end of the payload patches the first bytes of the PR_Write function. To fully understand the result of this patching, I will emulates this part of the shellcode in Qiling and lief.

from qiling import Qiling
from qiling.const import QL_ARCH, QL_OS, QL_VERBOSE
from unicorn import UC_PROT_ALL
from capstone import Cs, CS_ARCH_X86, CS_MODE_64
import lief

def get_PR_Write():
    pe: lief.PE.Binary = lief.PE.parse("./firefox/nss3.dll")
    pPR_Write_rva = pe.get_export().find_entry("PR_Write").address
    pPR_Write = pe.rva_to_offset(pPR_Write_rva)
    nss3 = open("./firefox/nss3.dll", "rb")
    nss3.seek(pPR_Write)
    return nss3.read(0x1000)

SHELLCODE = open("./shellcode.bin", "rb").read()
MARKER = (0x3300730073006e).to_bytes(8, "big")
PR_WRITE_BUFFER = get_PR_Write()
PR_WRITE_ADDRESS = 0x40000000
DEST_BUFFER_ADDRESS = 0x30000000
EMU_START = 0x140000544
EMU_STOP = 0x140000592

cs = Cs(CS_ARCH_X86, CS_MODE_64)

def GetAllocatedBuffer_hook(ql: Qiling) -> None:
    print("Hooking rsp request to get malware allocated buffer")
    ql.arch.regs.rax = DEST_BUFFER_ADDRESS

    # patch rip to the next instruction
    code = ql.mem.read(ql.arch.regs.rip, 16)
    insns = list(cs.disasm(code, ql.arch.regs.rip, count=1))
    ql.arch.regs.rip += insns[0].size if insns else 1
    return 


def stop_emu_hook(ql: Qiling) -> None:
    print("Hooking Stop emulation")
    ql.emu_stop()

if __name__ == "__main__":
    ql = Qiling(code=SHELLCODE, ostype=QL_OS.WINDOWS, archtype=QL_ARCH.X8664 ,rootfs="./firefox/x8664_windows/", verbose=QL_VERBOSE.DISASM)

    # Emulates the pointer returned by GetProcAddr(nss3_DllBase, 'PR_Write');
    ql.mem.map(PR_WRITE_ADDRESS, len(PR_WRITE_BUFFER), UC_PROT_ALL, info="Emulates the pointer returned by GetProcAddr(nss3_DllBase, 'PR_Write');")
    ql.mem.write(PR_WRITE_ADDRESS, PR_WRITE_BUFFER)
    ql.arch.regs.r15 = PR_WRITE_ADDRESS

    # Emulates the pointer returned by VirtualAlloc()
    ql.mem.map(DEST_BUFFER_ADDRESS, len(PR_WRITE_BUFFER), UC_PROT_ALL, info="Emulates the pointer returned by VirtualAlloc")
    ql.mem.write(DEST_BUFFER_ADDRESS, PR_WRITE_BUFFER)

    # Hook the stack request
    ql.hook_address(GetAllocatedBuffer_hook, 0x140000568)
    ql.hook_address(stop_emu_hook, 0x14000058e)

    ql.run(begin=EMU_START, end=EMU_STOP)
    open("patched_PR_Write.bin", "wb").write(ql.mem.read(PR_WRITE_ADDRESS, 0x1000))
[=]     0000000140000544 [[shellcode]          + 0x000544]  41 c7 07 41 55 41 54 mov                  dword ptr [r15], 0x54415541
[=]     000000014000054b [[shellcode]          + 0x00054b]  66 41 c7 47 04 49 bc mov                  word ptr [r15 + 4], 0xbc49
[=]     0000000140000552 [[shellcode]          + 0x000552]  e8 05 00 00 00       call                 0x14000055c
[=]     000000014000055c [[shellcode]          + 0x00055c]  58                   pop                  rax
[=]     000000014000055d [[shellcode]          + 0x00055d]  49 89 47 06          mov                  qword ptr [r15 + 6], rax
[=]     0000000140000561 [[shellcode]          + 0x000561]  66 41 c7 47 0e 49 bd mov                  word ptr [r15 + 0xe], 0xbd49
[=]     0000000140000568 [[shellcode]          + 0x000568]  48 8b 04 24          mov                  rax, qword ptr [rsp]
Hooking rsp request to get malware allocated buffer
[=]     000000014000056c [[shellcode]          + 0x00056c]  49 89 47 10          mov                  qword ptr [r15 + 0x10], rax
[=]     0000000140000570 [[shellcode]          + 0x000570]  41 c7 47 18 41 ff e4 cc mov                  dword ptr [r15 + 0x18], 0xcce4ff41
[=]     0000000140000578 [[shellcode]          + 0x000578]  4c 89 f9             mov                  rcx, r15
[=]     000000014000057b [[shellcode]          + 0x00057b]  48 c7 c2 00 10 00 00 mov                  rdx, 0x1000
[=]     0000000140000582 [[shellcode]          + 0x000582]  49 c7 c0 20 00 00 00 mov                  r8, 0x20
[=]     0000000140000589 [[shellcode]          + 0x000589]  4c 8d 4c 24 10       lea                  r9, [rsp + 0x10]
[=]     000000014000058e [[shellcode]          + 0x00058e]  48 81 ec 00 01 00 00 sub                  rsp, 0x100
Hooking Stop emulation

Let's now inspect the patched function into Ghidra :

The function preamble is simple, it save some registers context and jumps into a sub routine of the shellcode. Let's also have a look at it I guess...

The fifth step : The Malware Hook

The hook first checks if the buffer is bigger than 4 and if it starts by POST, I guess this is a way to only hook POST requests and steal usernames, password, cookies, and such... It then uses the first function we analyzed to retrieve the GetProcAddress function.

The rest of the code uses PEB Walking and CRC32 "hashing" to invoke a specific dll, but I couldn't find which one using static analysis... Anyway in the meantime the shellcode also puts strings on the stack.

1400170c2 158    MOV           dword ptr [RSP + Stack[-0xe6]],"atSA"
...
1400170de   0    MOV           word ptr [RSP + Stack[0x70]],"SW"
...
140017105   0    MOV           word ptr [RSP + Stack[0x76]],"tr"
...
140017115   0    MOV           dword ptr [RSP + Stack[0x78]],00h,00h,"pu"

This gives the string : WSAStartup, which is a function of the winsock.h library. This function allocates a struct to further handle windows sockets.

The rest of function effectively allocates an IPV4 UDP socket :

pseudo_view
GetProcAddr(WinSock.DllBase, "WSAStartup")
Winsock.WSAStartup("2.2", &wsaData)
GetProcAddr(WinSock.DllBase, "socket")
WinSock.socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)
GetProcAddr(WinSock.DllBase, "sendto")

And here is the last function :)

The function retrieves some data on the stack (in place of RIP), and does some substitutions on it (XOR, ROL, and ADD). Since all theses substitutions are reversible, Let's deobfuscate this data :

package main

func rol8(b byte, n uint) byte {
    return (b << n) | (b >> (8 - n))
}

func main() {
    encBuf := []byte{
        0x54, 0xd5, 0x13, 0x3a, 0x41,
        0x99, 0xd4, 0xb6, 0x93, 0x74,
        0x19, 0x41, 0x97, 0x61, 0x5a,
        0xb6, 0x54, 0x61, 0x61, 0x38,
        0x81, 0x98, 0xb7, 0xb6, 0xf4,
        0xe1, 0xd8, 0xb9, 0x77, 0x19,
        0xf7, 0xfa,
    }

    for _, e := range encBuf {
        byte := rol8(e+0xED, 3) ^ 0x42
        print(string(byte))
    }
}

And we've got our flag ;)

References

The windows bible

  • https://www.geoffchappell.com

Qiling

  • https://docs.qiling.io/en/latest/
  • https://n1ght-w0lf.github.io/tutorials/qiling-for-malware-analysis-part-2/

Process injection

  • https://syrion.me/rustware-part-1-shellcode-process-injection-development/

PEB Walking

  • https://www.m0n1x90.dev/blog/peb-teb-eat
  • https://yelhamer.github.io/posts/Resolving-Windows-API-Functions-via-the-PEB_LDR_DATA-Structure/
  • https://0xrick.github.io/win-internals/pe3/
  • https://0xrick.github.io/win-internals/pe4/
  • https://stackoverflow.com/questions/68185603/how-image-dos-header-works
  • https://dev.to/wireless90/exploring-the-export-table-windows-pe-internals-4l47
  • https://sachiel-archangel.medium.com/how-to-analyze-dll-address-acquisition-process-593bc8a54988

Firefox PR_Write function

  • https://firefox-source-docs.mozilla.org/nspr/reference/pr_write.html