Reverse Engineering

From HackThePlanet - Murdoch Hackers Club
Jump to: navigation, search

Overview

Reverse engineering (REing) is an umbrella term for a process where a system's design is (partially)re-constructed (usually)without any(or limited) access to the original source code/documentation. Traditionally it means to understand something by taking it apart and analysing it’s components.

Why would you want to learn it?

It’s fun to learn new skills. It’s a problem solving skill that’ll help you in many other areas of Info-Sec research such as Binary Exploitation. Learning RE techniques can lead to gaining a greater appreciation for the complexities of computers. Also, IDA looks pretty dope when you have it open.

Other applications for RE could be reversing some JavaScript that seems obfuscated, or even code in general, to see how it works. RE is a versatile tool to have to your disposal in your hacking arsenal.

Targets of Reverse Engineering

  • Computer Programs: These are usually reverse engineered to find Security Flaws, Modify Functionality and Steal Design Elements (Corporate Espionage).
  • Serialised Data Structures: These are usually reverse engineered to Break Up An Implementation Monopoly and occasionally find Security Flaws


There are two main reverse engineering techniques that this article will cover; static and dynamic analysis.

Static Analysis

Static Analysis is when you analyse a binary without executing it. The binary is analysed as a file and not a running process. Static Analysis is done with disassemblers, decompilers, hex editor, and other similar tools. We’ll focus on disassemblers in this article (until it is updated).

Examples in this article are x86 Linux ELFs interpreted into x86 Intel Syntax Assembly.

To introduce the functionality of static and dynamic analysis we’ll be using the following code. When ever "simple binary" or "simple.elf" are mentioned, it is referring to the following code compiled.

#include <stdio.h>

int main() {
	int a = 1;
	int b = 2;
	a = a + b;

	return 0;
}

Compiled with gcc simple.c –m32 -O0 -o simple.elf


Disassemblers

Disassemblers will take your binary as input and interpret the machine instructions it contains into a low level language such as assembly.

Most modern disassemblers can:

  • Find embedded symbols (basically, mnemonics for variables, functions, etc. )
  • Explicitly interpret specific parts of the binary as either data or text (code) [map the binaries regions].
  • Find Structs, Enums, Strings, Functions, Types, etc.
  • Make a pretty graph showing the control flow of the program.
  • Show Imports / Exports of libraries and external functions … and more!


Downloads in Tools


objdump

You can also disassemble the simple binary with objdump -d simple.elf and filter for hex.

If we analyse the .text section of the simple binary with a hex editor we can see the machine instructions represented by hexidecimal numbers.

...
55 89 e5 83 ec 0c 31 c0 c7 45 fc
00 00 00 00 c7 45 f8 01 00 00 00
c7 45 f4 02 00 00 00 8b 4d f8 03
4d f4 89 4d f8 83 c4 0c 5d c3
...

This doesn’t mean much to us, but you could translate this into something you could understand, such as assembly, if you had an x86 opcode lookup table.

Here's the same binary but disassembled into Intel Syntax Assembly.

push   ebp
mov    ebp, esp
sub    esp, 0xc
xor    eax, eax
mov    DWORD PTR [ebp-0x4], 0x0
mov    DWORD PTR [ebp-0x8], 0x1
mov    DWORD PTR [ebp-0xc], 0x2
mov    ecx, DWORD PTR [ebp-0x8]
add    ecx, DWORD PTR [ebp-0xc]
mov    DWORD PTR [ebp-0x8], ecx
add    esp, 0xc
pop    ebp
ret

Using objdump -M intel -d simple.elf. I did clean up the output, just to get the assembly code.


IDA
Disassembly graph view of simple.elf

Using IDA we can see it has given us some symbols (var_C, var_8, etc…). We can also see that it found the memory offsets for argc, argv, and envp, even though we didn't declare them explicitly in the C code.

These are offsets that can be used with ebp to find local variables in our main function’s stack frame. In more complicated programs that use conditions, we’ll be able to see control flow. We’ll look at that soon.

Just something worth noting:

Among the symbols IDA has found, "var_4" seems to be unused (you can also find this on the 5th line of the assembly output from objdump mov DWORD PTR [ebp-0x4], 0x0).

This looks like an artefact the compiler left behind, most likely because we compiled the with the -O0 flag. Doing this tells the compiler not to optimise the code. We had to do this because if we had optimised it, the function would literally just return.




Commenting your assembly

This is a good technique for getting an idea of how your program actually runs.

Rather than having a brief overview of the control flow, you can follow the instructions, and note down exactly what happens. This is essentially a manual way of performing a hybrid of Static and Dynamic analysis, as you're running through the code flow in your head.

The only down side is this takes a long time. This is where dynamic analysis comes into play!

push   ebp                       ; Push previous stack frame.
mov    ebp, esp                  ; Move SP to EBP to set new stack frame.
sub    esp, 0xc                  ; Reserve 0xc bytes for local variables.
xor    eax, eax                  ; Clear eax (eax is returned from the function).
mov    DWORD PTR [ebp-0x4], 0x0  ; Move 0x0 into local variable ebp-0x4.
mov    DWORD PTR [ebp-0x8], 0x1  ; Move 0x1 into local variable ebp-0x8.
mov    DWORD PTR [ebp-0xc], 0x2  ; Move 0x2 into local variable ebp-0xc.
mov    ecx, DWORD PTR [ebp-0x8]  ; Move local variable ebp-0x8 into ecx.
add    ecx, DWORD PTR [ebp-0xc]  ; Add local variable ebp-0xc to ecx.
mov    DWORD PTR [ebp-0x8], ecx  ; Move value of ecx into local variable ebp-0x8.
add    esp, 0xc                  ; Set SP back to location before.
pop    ebp                       ; Restore base pointer.
ret                              ; Pop EIP.

Dynamic Analysis

Unlike Static Analysis, you analyse a binary by executing it and following it’s process of execution. You can perform all the same actions as if you were statically analysing, but with the advantage of running the code and seeing how it physically modifies registers and memory. The advantage of this is speed, and being able to record all the actions that are taking place.

Two main tools that are used are Debuggers and Memory Editors. We’ll focus on Debuggers (though debuggers can usually edit memory).


Debuggers

Debuggers will take your binary as input, create a running process, and attach itself to that process. The debugger can halt, step through, and modify all aspects of your binary’s running process. We’ll be using GDB in this article.

Most debuggers can:

  • Do mostly all that a disassembler can do – and more.
  • Disassemble the instructions in the program, see which instruction is going to run next, and then step through those instructions.
  • Read / Write memory (heap, stack), map memory regions.
  • Modify and inspect register values.
  • Manipulate and track states.


Downloads in Tools


GDB

Attaching GDB to simple.elf using gdb simple.elf and disassembling the main function we get the following.

Common GDB Commands

gdb simple.elf -q
Reading symbols from simple.elf...(no debugging symbols found)...done.
(gdb) disassemble main
Dump of assembler code for function main:
   0x00001f80 <+0>:		push   ebp
   0x00001f81 <+1>:		mov    ebp,esp
   0x00001f83 <+3>:		sub    esp,0xc
   0x00001f86 <+6>:		xor    eax,eax
   0x00001f88 <+8>:		mov    DWORD PTR [ebp-0x4],0x0
   0x00001f8f <+15>:	mov    DWORD PTR [ebp-0x8],0x1
   0x00001f96 <+22>:	mov    DWORD PTR [ebp-0xc],0x2
   0x00001f9d <+29>:	mov    ecx,DWORD PTR [ebp-0x8]
   0x00001fa0 <+32>:	add    ecx,DWORD PTR [ebp-0xc]
   0x00001fa3 <+35>:	mov    DWORD PTR [ebp-0x8],ecx
   0x00001fa6 <+38>:	add    esp,0xc
   0x00001fa9 <+41>:	pop    ebp
   0x00001faa <+42>:	ret    
End of assembler dump.

If you're not getting the same instructions as before, you'll want to set your disassembly syntax to intel:

set disassembly-flavor intel

You can add that line to ~/.gdbinit so it's executed when you start up gdb.


Next we'll set a breakpoint in the main function, run the program, and disassembling to see which instruction we’ve landed on when it hits the breakpoint.

To set a breakpoint in GDB use: break *<address or symbol plus optional offset>

(gdb) break *main
Breakpoint 1 at 0x1f80
(gdb) run 
Starting program: /Users/nandayo/Desktop/simple.elf 

Breakpoint 1, 0x00001f80 in main ()
(gdb) disassemble 
Dump of assembler code for function main:
=> 0x00001f80 <+0>:		push   ebp
   0x00001f81 <+1>:		mov    ebp,esp
   0x00001f83 <+3>:		sub    esp,0xc
   0x00001f86 <+6>:		xor    eax,eax
   0x00001f88 <+8>:		mov    DWORD PTR [ebp-0x4],0x0
   0x00001f8f <+15>:	mov    DWORD PTR [ebp-0x8],0x1
   0x00001f96 <+22>:	mov    DWORD PTR [ebp-0xc],0x2
   0x00001f9d <+29>:	mov    ecx,DWORD PTR [ebp-0x8]
   0x00001fa0 <+32>:	add    ecx,DWORD PTR [ebp-0xc]
   0x00001fa3 <+35>:	mov    DWORD PTR [ebp-0x8],ecx
   0x00001fa6 <+38>:	add    esp,0xc
   0x00001fa9 <+41>:	pop    ebp
   0x00001faa <+42>:	ret    
End of assembler dump.


What if we wanted to see the final result of a + b from our C program? Well, we know this is the line that stores the final result of a + b back into a:

 0x00001fa3 <+35>:	mov    DWORD PTR [ebp-0x8],ecx

We can print this location after the instruction is executed.

We can do this by setting a breakpoint AFTER the instruction, so we know it has executed. Then we can examine 1 DWORD as hex at memory location [ebp-0x8] using x/dwx $ebp-0x8, where x is examine, and dwx is DWORD as hex.

(gdb) break *main+38
Breakpoint 2 at 0x1fa6

(gdb) continue 
Continuing.

Breakpoint 2, 0x00001fa6 in main ()

(gdb) x/dwx $ebp-0x8
0xbffffae0:	0x00000003     <=  THE RESULT!

Practise

Now that you have the basics, try the “Firetruck” challenge from the C2C 2016 CTF Event.

You can find the challenge hosted at: ctf.hacktheplanet.club/challenges#Firetruck

Give it a go before you watch the solution: here

You can solve this challenge through Static Analysis, however you can use what ever tool you would like to. Try using IDA or learning a similar tool such as the ones mentioned in this article!

Resources

Tools

  • IDA - Website (Debugger / Disassembler)
  • Hopper - Website (Debugger / Disassembler)
  • radare2 - Website (Debugger / Disassembler)
  • Binary Ninja - Website (Disassembler)
  • GDB - Website (Debugger) Can be installed through your *nix package manager

Useful Links