By Emilie Mavel Christensen
This article is the second entry in a series about malware reverse engineering. Last week, we explored the why of malware reverse engineering. This week, we dive into the how and examine the obstacles that may arise along the way.
Overall process
At its core, malware consists of instructions, data or programs designed to execute specific tasks[1] on a computer with malicious intent. It typically takes the form of an executable file containing instructions that guide the computer’s central processing unit (CPU) through running the program[2].
There are two primary approaches to malware reverse engineering: static analysis and dynamic analysis. Static analysis involves examining software and its code without running it, while dynamic analysis involves running the software and observing its effects on a system[3], [4]. The former uses tools like disassemblers and/or decompilers, whereas the latter employs sandboxing or memory forensics[3].
It is essential to conduct malware reverse engineering in a secure and isolated environment, ideally using a virtual machine with no networking connections to other devices or the internet. While this article will not detail how to set up such an environment, numerous qualitative guides are available online.
Static analysis
Static analysis is usually the first step in reverse engineering. The process involves analysing the code or structure of a program to identify its functionalities without running it. This method is safer than dynamic analysis, as it does not require the analyst to run the malware to dissect it. However, it has limitations, as it will be unable to detect runtime specific behaviour, and the analysed code will not be rid of potential obfuscation techniques[5].
Some of the key techniques used in static analysis include:
- Analysing stored strings: In this common first step, analysts look at the strings stored in the program to uncover information, such as function names, error messages or IP addresses, which can provide an early understanding of the malware and its functionalities. Strings can be identified using a disassembler or other specialised software, such as the one aptly named Strings[6].
- Analysing program headers: The Portable Executable (PE) file format is used by Windows executables, object code, and DLLs, and contains the information necessary for the Windows OS loader to manage the wrapped executable code. PE files begin with a header that includes information about the code, the type of application, required library functions and space requirements, which is of great value to the analyst.
For example, if an analyst can see in the required library functions that a program uses the function URLDownloadToFile, it may be inferred that it connects to the Internet to download some content, which is then stored in a local file[3].
- Disassembling code: In short, disassembly is the process of ‘translating back’ code after it has been compiled to machine code. The resulting assembly code is a set of instructions that can be hard to read, but worth looking into. For example, if an analyst encounters a function containing only logical, shifting and roll-over instructions repeatedly and seemingly randomly, they can assume they have encountered an encryption or compression function and can label that chunk of code as such.
- Decompiling: Decompiling goes a step further, converting assembly code into a higher level language for easier human understanding[7]. While not a copy of the original code, merely an educated guess based on the behaviour of the binary and assembly code[3], [8], it makes for a more readable starting point for a static code analysis.
- Symbolic execution: This advanced technique uses symbolic values instead of real inputs to analyze program behavior. By generating equations or logic trees, analysts can predict how different inputs affect execution paths. However, this method can become unwieldy with complex or obfuscated code[9], [10].
Dynamic analysis
Usually performed after a thorough static analysis, dynamic analysis involves monitoring software as it runs or examining a system after specific software has executed[3]. Multiple techniques can be employed to monitor the effects of the software. These include:
- Debugging: A debugger is a tool used to test or examine the execution of a program while said program is running[11]. Debuggers allows analysts to add breakpoints in the code, which pause the execution of the program enabling inspection and modification of current variables[3], [12]. Debuggers can also pause code execution when an exception is thrown by the computer, so that an analyst/programmer can try to understand what triggered the exception[13].
A debugger and breakpoints can be useful tools to understand code. For example, one can put a breakpoint at an “if” statement, then observe the variation in the code execution based on the value fed to that “if” statement. This interactive, step-by-step process can give analysts valuable insight into the code[14].
- Memory forensics: Another way to perform dynamic analysis is running the malware in a controlled environment then analyse the environment’s memory, using tools like Volatility[15]. This allows analysts to observe the state of a system at a specific point in time and examine data stored in the system’s volatile memory[5].
- Emulation and sandboxing: Automated tools like Cuckoo Sandbox[16] or Any Run[17] enable dynamic analysis of malware while keeping risks for their systems safe. These tools run the malware in local or online virtual machines for a predefined period of time and record all changes made to the virtual system. Some tools even generate pre-filled reports summarizing the results of the emulation[5].
However, these tools present some issues: Code samples maybe shared with other users, or even made publicly available. If a victim of a very specific malware shares a sample on one of these online tools, attackers may be made aware that their attack has been detected and subsequently develop stealthier malwares[18]. Furthermore, automated tools cannot prevent malware from running their anti-forensics/anti-VM functions, which are increasingly common[19].
Anti-forensics techniques
Malware writers anticipate their code will land in the hands of security researchers sooner or later. To hinder detection and analysis, malware coders often employ anti-forensics or obfuscation techniques.[20], [21]. Some of these techniques are:
- Masquerading: To evade detection, malware authors can mimic legitimate software by altering file metadata (name, icon), reusing valid code signatures[23], or using double file extension (e.g. “File.txt.exe”)[24]. This technique is called Masquerading[22]. To uncover it, analysts can compare suspected malware to the software it impersonates, or verify the source from which the malware was downloaded.
- Detection of virtual machine/potential forensics environment: Malware often implements obfuscation techniques to prevent the malware from running in a virtual machine in an attempt to avoid detection. This can be done in various ways: for example, it may enumerate the running processes and open windows to detect known forensics tools[25] or search for files typical of virtual machines[26]. If a controlled environment is detected, it will stall or stop its process.
This technique can be evaded in different ways. One way is to remove identifiable files on the virtual machine or by evading the virtual machine check when dynamically stepping through the code.
- Embedded payloads: Nested/embedded payloads are frequently used as a defence evasion tactic by malware[27]. This means the adversary embeds payloads within other files to make it look legitimate. A variant of this technique is process injection, in which an adversary injects code into other processes to make it look legitimate or elevate privileges[28]. One way to deal with it is by going through the malware code and look for long strings or debug the malware and see which processes are killed and/or started as one steps through the code.
- Encryption: Another common obfuscation technique for malware is the use of encryption, either to encrypt significant strings or encrypt communication between a malware and a C2C server. Encrypting these strings can make them hard to recontextualize during a static analysis of the code[25], [29]. Although one can argue these strings will likely be decrypted at runtime and are, as such, not obfuscated forever, running malware always carries a risk.
- Poly/metamorphic code: Poly/metamorphic malware changes its code, for example by permuting its functions or running a mutation engine on itself to appear like a different file with each execution[32], [33]. This complicates static analysis of a same malware strain as it will lead to different findings, making it harder to associate a malicious software to a said strain or develop ways to efficiently block that malware[32], [34].
This second has explored the techniques used by malware analysts to reveal the secrets hidden in malicious code. Next week, we will apply this knowledge to analyse malware recently encountered by Trifork Security.
Read part 1 here.
Sources:
[1] ‘What Is Software? | Definition from TechTarget’, Search App Architecture. Accessed: Nov. 08, 2024. [Online]. Available: https://www.techtarget.com/searchapparchitecture/definition/software
[2] ‘Executable file: What is an Executable File in computing? | Lenovo US’. Accessed: Nov. 08, 2024. [Online]. Available: https://www.lenovo.com/us/en/glossary/executable-file/
[3] ‘Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software: Sikorski, Michael, Honig, Andrew: 8601400885581: Amazon.com: Books’. Accessed: Nov. 08, 2024. [Online]. Available: https://www.amazon.com/Practical-Malware-Analysis-Hands-Dissecting/dp/1593272901/ref=pd_rhf_ee_s_pd_sbs_rvi_d_sccl_1_3/131-2713907-8262453?pd_rd_w=i5gGO&content-id=amzn1.sym.46e2be74-be72-4d3f-86e1-1de279690c4e&pf_rd_p=46e2be74-be72-4d3f-86e1-1de279690c4e&pf_rd_r=JXXFTAADPVB78E5K702F&pd_rd_wg=ycGc0&pd_rd_r=2a693ef3-ed13-4a22-a2db-af3ea6fbb27f&pd_rd_i=1593272901&psc=1
[4] ‘pwn.college’. Accessed: Nov. 08, 2024. [Online]. Available: https://pwn.college/intro-to-cybersecurity/reverse-engineering/
[5] S. Sengupta, ‘Reverse Engineering Malware: Techniques And Tools For Analyzing And Dissecting Malicious Software’, Medium. Accessed: Nov. 13, 2024. [Online]. Available: https://sudip-says-hi.medium.com/reverse-engineering-malware-techniques-and-tools-for-analyzing-and-dissecting-malicious-software-4dd5949135f0
[6] markruss, ‘Strings – Sysinternals’. Accessed: Nov. 08, 2024. [Online]. Available: https://learn.microsoft.com/en-us/sysinternals/downloads/strings
[7] ‘What is decompile?’, WhatIs. Accessed: Nov. 11, 2024. [Online]. Available: https://www.techtarget.com/whatis/definition/decompile
[8] P. N. F. Software, ‘What is decompilation?’, Medium. Accessed: Nov. 11, 2024. [Online]. Available: https://medium.com/@pnfsoftware/what-is-decompilation-26ce48f282bc
[9] C. Thiede, ‘Symbolic Execution and Applications’.
[10] ‘About this class | Introduction | Reverse Engineering 3201: Symbolic Analysis | OpenSecurityTraining2’. Accessed: Nov. 14, 2024. [Online]. Available: https://apps.p.ost2.fyi/learning/course/course-v1:OpenSecurityTraining2+RE3201_symexec+2021_V1/block-v1:OpenSecurityTraining2+RE3201_symexec+2021_V1+type@sequential+block@49a49d1795634800a04e6f319407bf03/block-v1:OpenSecurityTraining2+RE3201_symexec+2021_V1+type@vertical+block@28badad322e24196923d01b2b2c8fc24
[11] ‘Debuggers’, IONOS Digital Guide. Accessed: Nov. 11, 2024. [Online]. Available: https://www.ionos.com/digitalguide/websites/web-development/debugger/
[12] ‘What is debugging?’, Search Software Quality. Accessed: Nov. 11, 2024. [Online]. Available: https://www.techtarget.com/searchsoftwarequality/definition/debugging
[13] Mikejo5000, ‘Debugging techniques and tools – Visual Studio (Windows)’. Accessed: Nov. 11, 2024. [Online]. Available: https://learn.microsoft.com/en-us/visualstudio/debugger/write-better-code-with-visual-studio?view=vs-2022
[14] jeFF0Falltrades, Reverse Engineering and Weaponizing XP Solitaire (Mini-Course), (Nov. 26, 2022). Accessed: Nov. 11, 2024. [Online Video]. Available: https://www.youtube.com/watch?v=ZmPArvsSii4
[15] ‘The Volatility Foundation – Promoting Accessible Memory Analysis Tools Within the Memory Forensics Community’, The Volatility Foundation – Promoting Accessible Memory Analysis Tools Within the Memory Forensics Community. Accessed: Nov. 13, 2024. [Online]. Available: https://volatilityfoundation.org/
[16] ‘Cuckoo Sandbox – Automated Malware Analysis’, Cuckoo Sandbox – Automated Malware Analysis. Accessed: Nov. 13, 2024. [Online]. Available: https://cuckoosandbox.org/
[17] ‘Interactive Online Malware Analysis Sandbox – ANY.RUN’. Accessed: Nov. 13, 2024. [Online]. Available: https://app.any.run/
[18] ‘Malware Analysis is for the (Cuckoo) Birds’, TrustedSec. Accessed: Nov. 14, 2024. [Online]. Available: https://trustedsec.com/blog/malware-cuckoo-1
[19] Avira, ‘Cuckoo Sandbox vs. Reality’, Avira Blog. Accessed: Nov. 14, 2024. [Online]. Available: https://www.avira.com/en/blog/cuckoo-sandbox-vs-reality-2
[20] ‘MITRE ATT&CK®’. Accessed: Nov. 14, 2024. [Online]. Available: https://attack.mitre.org/
[21] B. Blunden, Rootkit Arsenal: Escape and Evasion in the Dark Corners of the System. Jones & Bartlett Publishers, 2013.
[22] ‘Masquerading, Technique T1036 – Enterprise | MITRE ATT&CK®’. Accessed: Nov. 27, 2024. [Online]. Available: https://attack.mitre.org/techniques/T1036/
[23] ‘Masquerading: Invalid Code Signature, Sub-technique T1036.001 – Enterprise | MITRE ATT&CK®’. Accessed: Nov. 27, 2024. [Online]. Available: https://attack.mitre.org/techniques/T1036/001/
[24] ‘Masquerading: Double File Extension, Sub-technique T1036.007 – Enterprise | MITRE ATT&CK®’. Accessed: Nov. 27, 2024. [Online]. Available: https://attack.mitre.org/techniques/T1036/007/
[25] ‘An Example of Common String and Payload Obfuscation Techniques in Malware’, Security Intelligence. Accessed: Nov. 13, 2024. [Online]. Available: https://securityintelligence.com/an-example-of-common-string-and-payload-obfuscation-techniques-in-malware/securityintelligence.com/an-example-of-common-string-and-payload-obfuscation-techniques-in-malware
[26] ‘Qakbot Malware Takedown and Defending Forward | Huntress’. Accessed: Nov. 12, 2024. [Online]. Available: https://www.huntress.com/blog/qakbot-malware-takedown-and-defending-forward
[27] ‘Obfuscated Files or Information: Embedded Payloads, Sub-technique T1027.009 – Enterprise | MITRE ATT&CK®’. Accessed: Nov. 19, 2024. [Online]. Available: https://attack.mitre.org/techniques/T1027/009/
[28] ‘Process Injection, Technique T1055 – Enterprise | MITRE ATT&CK®’. Accessed: Nov. 27, 2024. [Online]. Available: https://attack.mitre.org/techniques/T1055/
[29] ‘Obfuscated Files or Information, Technique T1027 – Enterprise | MITRE ATT&CK®’. Accessed: Nov. 14, 2024. [Online]. Available: https://attack.mitre.org/techniques/T1027/
[30] ‘API Obfuscation – Unprotect Project’. Accessed: Nov. 14, 2024. [Online]. Available: https://unprotect.it/technique/api-obfuscation/
[31] P. Black, I. Gondal, and R. Layton, ‘A survey of similarities in banking malware behaviours’, Comput. Secur., vol. 77, pp. 756–772, Aug. 2018, doi: 10.1016/j.cose.2017.09.013.
[32] ‘What is Polymorphic Malware? Examples & Challenges’, SentinelOne. Accessed: Nov. 14, 2024. [Online]. Available: https://www.sentinelone.com/cybersecurity-101/threat-intelligence/what-is-polymorphic-malware/
[33] ‘What is a Polymorphic Virus? Examples & More | CrowdStrike’, CrowdStrike.com. Accessed: Nov. 14, 2024. [Online]. Available: https://www.crowdstrike.com/en-us/cybersecurity-101/malware/polymorphic-virus/
[34] ‘Understanding Evil: How to Reverse Engineer Malware | Huntress’. Accessed: Nov. 14, 2024. [Online]. Available: https://www.huntress.com/blog/understanding-evil-how-to-reverse-engineer-malware