How to unpack malicious SHC-compiled scripts with Qiling Framework

Malware authors often employ a variety of techniques to make life difficult for security researchers. These techniques can include using obfuscation, packing, encryption, and anti-debugging measures to hide the functionality of their code, thus making it more difficult to analyze. They may also use polymorphism, which involves continually changing the structure of their code to evade detection by signature-based antimalware software.

During analysis, it is important to extract as much useful information from the target as possible to classify malware and develop productive detection mechanics and rules.

Before we begin, note that while static analysis can provide valuable information in some cases, it may require additional time and effort to overcome in others. It is therefore important to supplement it with other techniques, such as dynamic analysis, to gain a more complete understanding of the malware’s behavior and extract runtime data.

Unpacking malicious scripts for macOS, step by step

This article aims to provide insights on unpacking scripts from Mach-O binaries that have been compiled with SHC (Shell script compiler) using the dynamic analysis framework Qiling.

The is almost no information on Mach-O binaries emulation except for simple examples and test cases in the framework repository. This article also intends to fill this gap.

1. SHC

This experiment uses the SHC-3.8.9b fork for macOS, which can be fetched from here. All actions executed on Ventura 13.2.1 in a virtual environment.

Important: Before cloning repositories and executing arbitrary commands, it is necessary to first review potentially dangerous code. It is also better to use a virtual environment.

Here are the installation steps:

$ git clone https://github.com/chris1111/SHC-3.8.9b.git
$ cd SHC-3.8.9b
$ make
$ chmod +x shc
$ cp ./shc /usr/local/bin

According to the SHC(1) main page, “SHC itself is not a compiler such as cc, it rather encodes and encrypts a shell script and generates C source code with the added expiration capability. It then uses the system compiler to compile a stripped binary which behaves exactly like the original script.”

In the other words, the input script will be decrypted and evaluated at runtime by the SHC produced binary.

Next, here’s how to encode a simple script: hello.sh

#!/bin/bash
echo "Hello!!!"

Just look for the command line arguments SHC provides:

$ shc -h
shc Version 3.8.9b, Generic Script Compiler
shc Copyright (c) 1994-2015 Francisco Rosales <[email protected]>
shc Usage: shc [-e date] [-m addr] [-i iopt] [-x cmnd] [-l lopt] [-rvDTCAh] -f script

    -e %s  Expiration date in dd/mm/yyyy format [none]
    -m %s  Message to display upon expiration ["Please contact your provider"]
    -f %s  File name of the script to compile
    -r     Relax security. Make a redistributable binary
    -T     Allow binary to be traceable [no]
    -h     Display help and exit
    ...
    some arguments were omitted

We can pass our script using -f argument. Additionally, we will use -r which stands for “Relax security.” This flag allows it to run binary on any system with the same operating system (in our case, macOS).

$ shc -r -f ./hello.sh
$ file *
hello.sh:     Bourne-Again shell script text executable, ASCII text
hello.sh.x:   Mach-O 64-bit executable x86_64
hello.sh.x.c: c program text, ASCII text

As you can see, SHC produced two files hello.sh.x.c and hello.sh.x, the source code and the compiled binary.

Here’s what happens when you try to run binary:

$ chmod +x ./hello.sh.x
$ ./hello.sh.x
./hello.sh.x: Operation not permitted
Killed: 9

This shows that macOS killed our binary due to a ptrace usage, and a quick look at the source code reveals the implementation of an untraceable security measure.

void untraceable(char * argv0)
{
    char proc[80];
    int pid, mine;

    switch(pid = fork()) {
    case  0:
        pid = getppid();
        /* For problematic SunOS ptrace */
#if defined(__FreeBSD__)
        sprintf(proc, "/proc/%d/mem", (int)pid);
#else
        sprintf(proc, "/proc/%d/as",  (int)pid);
#endif
        mine = !open(proc, O_RDWR|O_EXCL);
        if (!mine && errno != EBUSY)
            mine = !ptrace(PT_ATTACH, pid, 0, 0);
        if (mine) {
            kill(pid, SIGCONT);
        } else {
            perror(argv0);
            kill(pid, SIGKILL);
        }
        _exit(mine);
    case -1:
        break;
    default:
        if (pid == waitpid(pid, 0, 0))
            return;
    }
    perror(argv0);
    _exit(1);
}

This implementation isn’t designed for macOS and requires modification, signing, and higher privileges to work. Here’s how to bypass this check and unpack the script.

Add -T switch to allow binary to be traceable:

$ shc -T -r -f ./hello.sh
$ ./hello.sh.x
Hello!!!

This time, binary executed normally and printed “Hello!!!” as expected.

The next step is to test the trial’s functionality:

$ shc -T -r -f ./hello.sh -e "02/01/1970" -m "Trial ended"
$ ./hello_sh.x
./hello_sh.x: has expired!
Trial ended

This experiment uses all security features:

$ shc -r -f ./hello.sh -e "02/01/1970" -m "Trial ended"

2. hello_sh.x[.c]

In this section, we will review how a produced binary works at runtime. Plus, we’ll do some simple static and dynamic analysis.

$ strings -5 -a hello.sh.x
=%lu %d
%lu %d%c
%s%s%s: %s
<null>
...
garbage here
...

The strings command outputs a few strings and confirms that all useful information is encrypted and inaccessible. From the source code, it’s easy to understand how the runtime part works.

SHC uses the ARC4 (Alleged RC4) stream cipher. Every time SHC produces source code that later will be compiled, it generates a random 256-byte password that will be used as a cipher initialization key. All the provided and predefined data in specific constant order will be encrypted (strings, flags, etc.) and placed randomly in a data buffer with predefined labels referencing the data length (suffixed with _z) and encrypted data itself.

static  char data [] = 
#define      chk1_z	22
#define      chk1	((&data[1]))
	"\137\117\130\001\237\027\315\342\020\323\012\323\154\141\300\253"
	"\376\151\256\061\203\034\170"
#define      msg2_z	19
#define      msg2	((&data[24]))
	"\064\066\322\347\376\166\006\127\002\361\003\311\127\033\206\165"
	"\354\051\262\011\132\043\215"
#define      msg1_z	42
#define      msg1	((&data[56]))
...

This behavior randomizes the length of opcodes of the same code flow in compiled binary. It makes writing static signatures to detect more than one file with the same payload hard or impossible, except the detection of a general stub, identical in all binaries. There’s not much usable data, only the size of the script. But that will hardly be helpful, as it’s a false positive indicator.

Execution flow

untraceable() anti-tamper protection (was mentioned before)
check_env() checks if a dynamically generated environment variable exists and is valid, generates it using some pointer arithmetic, and sets it if the result is not null. It returns 0 if it can’t validate and triggers a relaunch without payload execution.
Check the trial condition and exit if the date is earlier than the current date.
When all the data is decrypted, build an argv[] array that includes the interpreter from the script, the argument to evaluate the script inline (e.g., -c), and the script payload, padded by 4096 bytes to hide from different monitoring tools during the process exec hooking.
Execute interpreter using execvp.

Note: To get the maximum argv size, execute getconf ARG_MAX. On macOS Ventura, the result is 1048576, more than enough to pass the script content in the argument of an interpreter.

We can also make a build with the debug information by passing a -D switch, which outputs some useful information:

$ shc -T -D -r -f ./hello.sh
$ ./hello.sh.x
...
getenv(xffff6c72d479f7c4)=18446581839179675588 2
shll=/bin/sh
argc=5
argv[0]=./hello.sh.x
argv[1]=-c
argv[2]=  <--- HERE MUST BE OUR SCRIPT
argv[3]=./hello.sh.x
argv[4]=2
argv[5]=<null>
Hello!!!

Ways to get the decrypted script

Follow these steps to receive the decrypted script:

Disassemble and parse opcodes to get encrypted data lengths and offsets.
Patch binary to skip all checks and print the script.
Use Endpoint Security Framework or dtrace to hook the interpreter process exec.
Use a debugger or runtime instrumentation tool like Frida.
Use an emulation framework like Qiling.

The simplest step is process monitoring. One option is a tool from Patrick Wardle called ProcessMonitor based on ESF to catch the script.

{
    "event":"ES_EVENT_TYPE_NOTIFY_EXEC",
    "timestamp":"2023-03-07 13:02:11 +0000",
    "process":{
        "pid":51544,
        "name":"bash",
        "path":"/bin/bash",
        "uid":501,
        "architecture":"unknown",
        "arguments":[
            "./hello.sh.x",
            "-c",
            "<-- 4096 space padding -->#!/bin/sh\necho \"Hello!!!\"","./test.sh.x"
        ],
        "ppid":36994,
        ...
    },
    ...
}

The execution of binaries in a specific real (virtual) environment, however, has some limitations:

Unsupported Mach-O format (specific SDK versions, unknown load commands, etc.)
Unsupported other platform binaries execution (ELF on macOS)

To make things universal, we can try to use an emulation framework that gives us multi-platform support and full control of loader and binary executions.

3. Qiling

The Qiling Framework is a powerful tool for the dynamic analysis of executable code. It is designed to emulate different architectures, operating systems, and environments, allowing it to execute code in a controlled environment and observe its behavior.

Qiling supports a wide range of file formats (Mach-O, ELF, PE) and can be used to analyze malware and exploit code and other types of executable code payloads. And its pure Python framework provides an API that allows for easy integration with other tools and workflows, making it flexible and versatile for dynamic analysis.

The first step is to set up the Python environment and all required libraries:

# Install Pyenv
$ curl https://pyenv.run | bash

# Add Pyenv to your PATH
$ echo 'export PATH="$HOME/.pyenv/bin:$PATH"' >> ~/.bashrc
$ echo 'eval "$(pyenv init -)"' >> ~/.bashrc
$ echo 'eval "$(pyenv virtualenv-init -)"' >> ~/.bashrc

# Install Python 3.8 with Pyenv
$ pyenv install 3.8.12

# Set Python 3.8 as the global version
$ pyenv global 3.8.12

# Verify that Python 3.8 is being used
$ python --version

# Install Qiling Framework and radare2 Python binding
$ pip install qiling r2libr

rootfs

The root file system (rootfs) is a virtual file system emulated by Qiling that provides the environment in which the code executes, including the files and directories that the code can access.

Qiling Framework provides qilingframework/rootfs for different platforms and architectures. Unfortunately, the files need to run even the simplest example for macOS x86_64 environment aren’t available, as it requires dyld and other libraries that the loader depends on.

One option is to use dyld from macOS 10.13, as it’s more primitive and contains less security and fewer checks. Try a custom dyld version to fit your needs. We had no luck running the dyld from macOS Ventura, so more research will be needed to proceed.

You can download macOS rootfs that suit your needs here.

4. un_shc.py

For this part, you first to set up a basic code that will emulate your binary using Qilling python library. This will be put into the rootfs/x8664_macos/bin/ folder.

#!/usr/bin/env python3

from qiling.const import QL_VERBOSE, QL_OS, QL_ARCH
from qiling import Qiling

def main():
    ql = Qiling(
        ["rootfs/x8664_macos/bin/hello_sh.x"], 
        "rootfs/x8664_macos", 
        ostype=QL_OS.MACOS, 
        archtype=QL_ARCH.X8664, 
        verbose=QL_VERBOSE.DISABLED,
    )
    ql.run()

if __name__ == "__main__":
    main()

Attempting to launch this will return the following error:

$ python3 ./un_shc.py
dyld: Library not loaded: /usr/lib/libSystem.B.dylib
  Referenced from: /
  Reason: no suitable image found.  Did find:
	/usr/lib/libSystem.B.dylib: mmap() page compare failed for '/usr/lib/libSystem.B.dylib'
	/usr/lib/libSystem.B.dylib: stat() failed with errno=-2
	/usr/lib/libSystem.B.dylib: mmap() page compare failed for '/usr/lib/libSystem.B.dylib'
	/usr/lib/libSystem.B.dylib: stat() failed with errno=1
	/usr/lib/libSystem.B.dylib: mmap() page compare failed for '/usr/lib/libSystem.B.dylib'
	/usr/lib/libSystem.B.dylib: mmap() page compare failed for '/usr/lib/libSystem.B.dylib'

Investigation of such behavior reveals the root cause in the dyld source code in ImageLoaderMachO::validateFirstPages. It seems that dyld can’t verify the mapped pages in its memory due to the file offset being passed to xmmap. The reason for this is the use of mmap2 implementation for mmap macOS syscall. In mmap2, an offset parameter must be specified in the page size unit (offset/page_size). You can use mmap implementation instead and also handle exceptions during the unmapping of the shared memory.

...
from qiling.os.posix.syscall.mman import ql_syscall_mmap, ql_syscall_munmap
...

def ql_syscall_munmap_error_fix(*args):
    try:
        ql_syscall_munmap(*args)
    except Exception as e:
        pass


def main():
    ...
    # 0x20000c5 the mmap syscall number 
    ql.os.set_syscall(0x20000c5, ql_syscall_mmap)
    # 0x2000049 the unmap syscall number 
    ql.os.set_syscall(0x2000049, ql_syscall_munmap_error_fix)
    ql.run()
...

This returns another error:

dyld: cannot load '' (load command 0x80000034 is unknown)

The reason behind this error is an unsupported load command because SHC built your binary for a newer macOS version without the minimum supported version being specified in build flags. Fortunately, you can patch this check in ImageLoaderMachO::parseLoadCmds.

...

def dyld_unknown_cmd_check_hook(ql: Qiling, *args, **kwargs):
    # skipping jump
    ql.arch.regs.rip = ql.arch.regs.rip + 0x2 

def main():
    ...
    # By decompiling dyld we can find offset of the check:
    # if (firstUnknownCmd != NULL) {
    # __text:000000000001482C 48 83 7D C0 00  cmp     [rbp+var_40], 0
    # __text:0000000000014831 75 69           jnz     short loc_1489C
    dyld_unknown_cmd_check_addr = ql.loader.dyld_slide + 0x14831
    ql.hook_address(dyld_unknown_cmd_check_hook, dyld_unknown_cmd_check_addr)
    ...

This example uses ql.loader.dyld_slide to specify dyld’s loaded address and ql.hook_address to hook execution at a specific address. In the hook callback, you can skip the jump by adding to the rip register value the size of the jump opcode.

Your script will output the following error after launch:

...
Exception: Mach Msgid Not Found

Qiling doesn’t handle the 3403 mach_ports_register message ID used during fork() call. In any case, the binary executes and crashes in untracable() function during fork(). This anti-debug check can just be skipped.

Qiling has an r2 extension that uses r2libr module. It simplifies basic reverse-engineering tasks like resolving imports, finds cross-references, etc. In this case, you can check if the ptrace import exists and find the xrefs to it.

...

from qiling.extensions.r2 import R2

...

def untracable_before_hook(ql: Qiling, *args, **kwargs):
    # skipping call
    ql.arch.regs.rip = ql.arch.regs.rip + 0x5 

def main():
    ...
    r2 = R2(ql)
    # mute r2libr
    r2._cmd("e a{}l.cc=default; e scr.prompt=false".format("na"))

    try:
        ptrace_xref = r2.refto(r2.functions["sym.imp.ptrace"].offset)[0]
        # xref.name: 'sym.func.100003700+186'
        untracable_addr = r2.functions[ptrace_xref.name.split("+")[0]].offset
        untracable_call_xref = None
        for xref in r2.refto(untracable_addr):
            if xref.type == "CALL":
                untracable_call_xref = xref
                break
        ql.hook_address(untracable_before_hook, untracable_call_xref.fromaddr)
    except Exception as e:
        pass

By successfully bypassing the untracable() check, you’ll receive the next error:

rootfs/x8664_macos/bin/hello.sh.x: has expired!
Trial ended

The binary fails to proceed due to an expiration date check. To skip this, you can modify the time returned value.

...

def time_exit_hook(ql: Qiling, *args, **kwargs):
    # reset time to 01/01/1970
    ql.arch.regs.rax = 0x0

def main()
    ...
    time_xref = r2.refto(r2.functions["sym.imp.time"].offset)[0]
    time_exit_addr = time_xref.fromaddr + 0x5
    ql.hook_address(time_exit_hook, time_exit_addr)
    ...

As we described in the previous section, there is a check_env function. You will need to modify this returned result and set it to a non-0 value.

...

def check_env_exit_hook(ql: Qiling, *args, **kwargs):
    ret = ql.arch.regs.rax
    ql.arch.regs.rax = 0x1

...

def main():
    ...
    # getting check_env return address to modify returned value
    # search for getpid import xref, used by check_env function
    getpid_xref = r2.refto(r2.functions["sym.imp.getpid"].offset)[0]
    # xref.name: 'sym.func.1000034d0+31'
    check_env_addr = r2.functions[getpid_xref.name.split("+")[0]].offset
    check_env_call_xref = None
    for xref in r2.refto(check_env_addr):
        if xref.type == "CALL":
            check_env_call_xref = xref
            break
    check_env_exit_addr = check_env_call_xref.fromaddr + 0x5  # check_env call offset + size of call
    ql.hook_address(check_env_exit_hook, check_env_exit_addr)
    ...

To get the the script content, hook memcpy import.

...

def memcpy_enter_hook(ql, *args, **kwargs):
    scpt = ql.mem.string(ql.arch.regs.rsi)
    print(scpt)
    # stop the emulation
    ql.stop()


def main()
    ...
    # hooking memcpy function to extract decrypted string
    memcpy_xref = r2.refto(r2.functions["sym.imp.__memcpy_chk"].offset)[0]
    ql.hook_address(memcpy_enter_hook, memcpy_xref.fromaddr)
    ...

And finally, you will get the content of your script:

#!/bin/sh
echo "Hello world!"

If you test our script with a malicious file, a XCSSET malware sample, for example, the script outputs:

#!/bin/bash

AUTOCLEAN=$2
BASEDIR=$1
BASEDIR=${PROJECT_FILE_PATH}

BUILD_VERSION=1
BUILD_VENDOR="default"

RANDOM_PATHS=("$HOME/Library/Application Support/iCloud" "$HOME/Library/Application Scripts/com.apple.AddressBook.Shared" "$HOME/Library/Group Containers/group.com.apple.notes" "$HOME/Library/Containers/com.apple.routerd")


DOMAIN_ONE=$(echo "61 74 65 63 61 73 65 63 2e 69 6e 66 6f" | xxd -p -r)
DOMAIN_TWO=$(echo "6c 75 63 69 64 61 70 70 73 2e 69 6e 66 6f" | xxd -p -r)
DOMAIN_THREE=$(echo "69 63 6c 6f 75 64 73 65 72 76 2e 72 75" | xxd -p -r)

DOMAIN_FOUR=$(echo "72 65 76 6f 6b 65 63 65 72 74 2e 72 75" | xxd -p -r)
DOMAIN_FIVE=$(echo "64 61 74 61 73 6f 6d 61 74 69 63 2e 72 75" | xxd -p -r)
DOMAIN_SIX=$(echo "72 65 6c 61 74 69 76 65 64 61 74 61 2e 72 75" | xxd -p -r)


ACTIVE_DOMAINS=(${DOMAIN_ONE} ${DOMAIN_TWO} ${DOMAIN_THREE} ${DOMAIN_FOUR} ${DOMAIN_FIVE} ${DOMAIN_SIX})
TARGET_DOMAIN=${ACTIVE_DOMAINS[RANDOM%${#ACTIVE_DOMAINS[@]}]}

...

The process detailed in this article hardens the static analysis and creates a tool that dynamically extracts a hidden payload. As a result, a tool like this can be integrated into a malware analysis pipeline, such as karton, to improve and speed up the analysis process. And although this method requires a few workarounds, this simple tool does the job and hides the original source code.

Mykola N.

Mykola is a macOS security researcher and malware analyst at Moonlock, the cybersecurity division of MacPaw. With 20 years of experience in several cybersecurity areas, he is a participant of the Apple Security Bounty program and has earned multiple rewards for his reports on macOS vulnerabilities.