Vedant Bhatt: Reverse Engineering 2 - Disassembly Algorithms in Use

Linear Sweep Disassembly
This algorithm takes a straightforward approach: where one instruction ends, another begins. The most difficult decision is where to begin. The usual solution is to assume that everything contained in sections of a program marked as code represents machine language instructions. Disassembly begins with first byte in a code section and moves, in a linear fashion, through the section, disassembling one instruction after another until the end of the section is reached. It does not try to understand the programs control flow.

During the disassembly process, a pointer can be maintained to mark the beginning of the instruction being disassembled. As part of the disassembly process, length of each instruction is computed and used to determine the location of the next instruction to be disassembled. Fixed-length instruction sets are easier to disassemble.

The main advantage is that it covers complete coverage of a program's code sections. The primary disadvantage is that if fails to account for the fact that data might be combined with code.

Recursive Descent Disassembly
This focuses more on control flow and determine whether an instruction should be disassembled or not based on whether its referenced by another instruction.

Sequential Flow Instructions
They pass execution to the instruction that immediately follows. They include add, register-to-memory transfer instructions such as mov. Disassembly here proceeds as with linear sweep.

Conditional Branching instructions
Instructions like jnz offer two execution paths. If it evaluates to true, the branch is taken and the instruction pointer is changed to reflect the target of the branch. If it is false, it continues execution in a linear fashion and a linear sweep can be used to disassemble the next instruction. As it is not statically possible to determine the outcome of the condition, the recursive descent algorithms disassembled both paths, deferring disassembly of the branch target instruction by adding the address of the target instruction to a lost of addresses to be disassembled at a later point.

Unconditional Branching instructions
A recursive descent disassembler attempts to determine the target of the unconditional jump and add the destination address to the list of addresses that have yet to be explored. However, when the target of the unconditional jump depends on a runtime value, it may not be possible to determine the destination using static analysis. Common example of this is jmp eax

Function call Instructions
They operate similar to unconditional branches including the drawbacks, with the expectation that execution usually returns to the instruction immediately following the call instruction once the function completes. In this regard, they are similar to conditional branch instructions in that they generate two execution paths. The target address is added to a list of deferred disassembly, while the instruction immediately following the call is disassembled in a manner similar to linear sweep. Recursive descent would fail if programs do not behave as expected when returning from called functions.

Return instructions
A function return instruction offers no information about what instruction will be executed next. If the program were actually running, an address would be taken from the top of the runtime stack, and execution would resume at the address. Disassemblers do not have the benefit of a stack and come to a halt. It then turns to the list of addresses it has kept aside for deferred disassembly. An address is removed from that list and disassembly is continued from that address,

Recursive descent advantage - ability to distinguish code from data.
disadvantage - inability to follow indirect code paths such as jumps or calls which uses tables of pointers to look up a target address.

Vedant Bhatt

Wednesday, December 4, 2013

Reverse Engineering 2 - Disassembly Algorithms in Use

No comments:

Post a Comment