Introduction

Recently, numerous malicious RTF files generated by the Royal Road __RTF weaponizer tool__, are spotted in the wild to deliver attacks attributed to China State-Sponsored actors. So I decided through this post to analyze one of the files used in Cyber Espionage on Tibetian Citizens to explain how to identify the attached exploit and spotlight some shellcode features.

Overview

The first view of the RTF file shows the following two embedded objects:

  • ghb4nrwmp.wmf
  • Equation.2\x00\x124Vx\x90\x124VxvT2.

img(RTF document’s embedded objects)

When the file is opened, the process EQNEDT32.exe gets executed and spawns another EQNEDT32.exe process. Then, the child EQNEDT32.exe process launches a subprocess rundll32.exe.

img(The process tree when opening the RTF file)

The previous process tree can indicate that the file contains an exploit that targets the EQNEDT32.exe. This executable represents the equation editor of Microsoft Office that is responsible for interpreting the embedded equations inside the document files.

Probably the exploit will be embedded in the equation object which is called Equation.2\x00\x124Vx\x90\x124VxvT2.

Setup a debugging environment

Debugging the EQNEDT32.exe process, will help us later to identify the common vulnerabilities and exposures (CVE) that the exploit is targetting and its contents’ bytes.

This can be achieved by taking advantage of the Image File Execution Options (IFEO) feature will allow the debugger to attach the process EQNEDT32.exe as soon as it starts.

img(Creating key for EQNEDT32 process under IFEO in registry)

Determining the targetted CVE

My approach for identifying the targetted CVE will depend on the vulnerable function address and field of Object Linking and Embedding (OLE) object that contains the exploit bytes, as explained below.

Finding the vulnerable function

Usually, there are the following two main ways in which the exploits use to execute unintended code after hijacking the execution:

  • Using NX memory regions to execute shellcode which works when Data Execution Prevention (DEP) mitigation is off.
  • Using Return oriented programming (ROP) to bypass DEP mitigation by executing hosen machine instruction sequences from executable memory sections.

Firstly, I assumed that the first technique that executes the shellcode in NX memory regions is used. Therefore, to identify the start address of the shellcode, I will enforce DEP mitigation for all Windows processes that will cause an exception when NX memory gets executed.

img(Turning on DEP mitigation for all Windows processes)

Now by opening the RTF file, the debugger gets attached to the EQNEDT32.exe and then continues the execution till an exception is triggered. The address that causes the exception is pointing to the start of the shellcode.

In the below screenshot, the address 0058b63a is the start of the shellcode and is located on the heap memory.

img(The starting address of the shellcode)

Note that the EBP register holds a garbage value which indicates the occurrence of stack overflow that corrupts the saved frame pointer of the caller function.

img(The corrupted EBP register)

So by looking at the stack, we can see the exploit bytes that are used to overflow the stack and control the return address.

Also, because the return address of the vulnerable function is overwritten, the only thing that can be found on the stack is the return address for the caller function of the vulnerable function as appear below.

img(The state of the stack after the overflow)

By following the return address for the caller function of the vulnerable function in the disassembly, we can find the function at address 43a78f is the caller of the vulnerable function.

img(Finding the caller of the vulnerable function)

After that, I set breakpoints on the ret instructions for every called function from the caller of the vulnerable function (at address 43a78f). Then I continued the execution till one of the breakpoints at ret instructions got hit where EBP holds a corrupted value equal to 6a616161*.

Therefore function at address 443e34 is the vulnerable one.

img(The vulnerable function)

Locating the exploit within RTF structure

Now it’s time to locate the field within the OLE object that contains the exploit. As mentioned previously, the exploit has the following bytes.

img(The bytes of exploit on the stack)

Before beginning to search the RTF file for the exploit content, The deobfuscated content of the embedded object named Equation.2\x00\x124Vx\x90\x124VxvT2 should be extracted. This is achieved by setting a breakpoint atOleConvertOLESTREAMToIStorage API and then dumping the deobfuscated object pointed by the first parameter lpolestream. The mandiant blog post explains this trick in detail.

Finally, The following screenshot shows a snippet of the OLE object that contains the MathType (MTEF) object. This MTEF object has the bytes of an exploit inside the content of the Matrix tag.

img(The location of exploit within the embedded OLE object)

Note that the Matrix tag value in the above screenshot is not equal to 5 as in the MTEF documentation. The reason behind that exists in this blog post which explains that the function at address 43A720 in EQNEDT32.exe maps the tag value to one of the standard values in the documentation before processing its content.

Briefly, the exploit target a stack overflow vulnerability in the function at the address 443e34, and the exploit is found in the matrix tag of the MTEF object.

This concludes that this exploit targets CVE-2018-0798 as the analysis of this vulnerability matches our case.

The shellcode Analysis

This section will shed light on what the shellcode is trying to achieve and the anti-analysis tricks that are used.

Retrieving the base addresses of the needed DLL(s)

The shellcode will go through the InInitializationOrderModuleList that points to the doubly-linked list of LDR_DATA_TABLE_ENTRY nodes in which every node represents a loaded DLL, to retrieve the base address of the following DLL(s).

  • MSVCRT.dll

    img(Retrieving MSVCRT.dll base address)

  • KERNEL32.dll

    img(Retrieving KERNEL32.dll base address)

Fetching the addresses of needed API functions

The way used to retrieve the needed API functions is tricky as the shellcode iterates through the Import Address Table (IAT) of MSVCRT.dll rather than looking for them in the loaded DLLs export table. That is an anti-analysis trick that loads API(s) from a not automatically loaded library, making it harder to dynamically analyze the shellcode out of the EQNEDT32 process.

When the function in Import Table name (INT) has a hash equal to one of the needed API functions, it immediately returns the function address + 5. The reason for that will be explained in the next section.

img(Fetching API(s) From the IAT of MSVCRT.dll)

Calling the API Functions indirectly

Here is another trick employed in the shellcode that makes any API function gets called from inside the clearerr API (after its address gets resolved). That is achieved by replacing the content of the clearerr API with the below instructions.

img(The instructions that replace content of clearerr API)

The obvious goal of the above instructions is to push the parameters on the stack and call the API; however, the interesting part lies in the Call_API function.

As shown in the screenshot below, the Call_API function checks the first 5 bytes (prologue) of the API function that will get called. If these bytes map to asm instructions jmp, then it will escape executing the API prologue. Calling the API functions in this way means defeating inline-hooking in case it’s used in the analysis.

img(Evading inline hooking technique)

Additionally, I think this technique won’t only evade the inline hooking but also give false results. For example, if any API function gets called, the result of the hooking will show as if it’s a call to clearerr API.

The shellcode in nutshell

I don’t want to make the post longer as I feel the next part won’t have any interesting tricks. So briefly, the following API functions get resolved.

  • virtualAlloc
  • ReadFile
  • CloseHandleA
  • CreateProcessA
  • GetModuleFileNameA
  • ResumeThread
  • TerminateProcess
  • GetThreadContext
  • ReadProcessMemory
  • VirtualQueryEx
  • VirtualAllocEx
  • GetModuleHandleA
  • WriteProcessMemory
  • SetThreadContext
  • GetTempPathA

The resolved API functions will be used to read the file ghb4nrwmp.wmf from the %temp% folder and then decrypt it. After that, the shellcode creates another EQNEDT32.exe process in suspended mode and performs process injection into its memory space to execute the decrypted executable.