Vedant Bhatt: Dissecting the PE File Format

PE HEADER

A general term for a structure named IMAGE_NT_HEADERS

It has 3 members and is defined in windows.inc. Thus:

Signature is always PE as described in the previous post.

The next 20 bytes is the FileHeader which contains info about the physical layout and properties of the file. e.g. no. of sections. The OptionalHeader is always present and it contains info about the logical layout of the PE file. e.g. address of entry point.

FileHeader is defined as follows:

Most of these members are not of any use. If we add sections to the file, we must modify the NumberOfSections field. Characteristics have flags which indicate if the PE file is an executable or a DLL.

In the hex editor, we can find the numberOfSections by counting a DWORD and a WORD from the start of the PE header. This can be verified using PEView.

This is verified using PE View as seen in the figure.

PEid is another useful tool to scan executables and reveal the packer which has been used to compress/encrypt them. It also has the Krypto ANALyzer plugin for detecting the use of cryptography e.g. MD5, CRC. it can also utilize a user-defined list of packer signatures. This should be the first tool to use in any unpacking session.

Next, to the OptionalHeader which takes 224 bytes, the last 128 of which contain the Data Directory.

AddressOfEntryPoint - The RVA of the first instruction that will be executed when the PE loader is ready to run the PE file. If you want to divert the flow of execution, you need to change this value. Packers generally redirect this value to their decompression stub, after which execution jumps back to the original entry point of the app.
Note - StarForce protection in which the CODE section is not present in the file on disk but is written to virtual memory on execution. The value in this field is therefore a VA.

ImageBase - The preferred load address of the PE file. e.g. if its 400000h, the PE loader will try to load the file into virtual address space starting at 400000h. However, the loader may not load the file at this address if this space has already been occupied by some other module.

SectionAlignment - The granularity of the alignment of sections in memory. Generally 1000h. Each section must begin at multiples of this value.

FileAlignment - The granularity of the alignment of sections in the file. Generally 512h.

SizeOfImage - The overall size of the PE image in memory. Its the sum of all headers and sections aligned to section alignment.

SizeOfHeaders - The size of headers + section table. This value can probably also be used as an offset to the first section.

DataDirectory - An array of 16 IMAGE_DATA_DIRECTORY structures, each relating to an important data structure in the PE file. This structure will be discussed in the next post.

The overall layout of the PE header:

Ollydbg can also parse PE headers into a meaningful display. Open the file in Olly and press the M button on top to open the memory map - this shows how the sections of the PE file have been mapped.

Now right click on PE header, select dump to CPU. Next, in the hex window, right click again and select PE header.

There are specific points of interest here. If the last 2 members are given bogus values:

e.g. LoaderFlags = ABDBFFDEh

NumberOfRvaAndSizes = DFFFDDEh

Olly determines this as a bad image and will eventually run the app without breaking at the entry point. To avoid this when analyzing malware, open it in a hex editor first and check if the NumberOfRvaAndSizes is 10h.

In addition, the SizeOfRawData field can be given a very high value for one of the sections. This causes difficulty in a lot of debugging and disassembly tools.

An interesting twist - You might have noticed some garbage data between the DOS stub and the PE header. The origins of this data though not important is quite interesting.

PE files produced using M$ development tools contain extra bytes in the DOS stub inserted by the linker at compile time. In all cases, the penultimate DWORD is 'Rich'. This data is not present in files produced by other linkers( Borland, gcc)

The data includes encrypted codes to identify the components used to compile the PE file. The DWORD after 'Rich' is a key generated by the linker which repeats several times in the garbage data.When we compile a program the compiler puts the string "@comp.id" followed by a DWORD-sized compiler ID number in our obj file. When we link our obj file the linker extracts the @comp.id number and XORs it with the key and writes it in the "garbage" as the 2nd DWORD before "Rich"

The first DWORD before "Rich" is the key XORed with a hard coded constant 536E6144h. If we search @comp.id in the obj file and substitute the DWORD after it with zeroes, we see that the second DWORD before "Rich" is the key.

Hello World in a hex editor:

Its possible to patch the linker to stop this. SignFinder.exe by Asterix allows you to quickly find the code which needs patching in any version of Link.exe

So open Link.exe in Olly and press Ctrl+G. Enter 0044510C (address from SignFinder + ImageBase). Then highlight that instruction, rightclick and select Binary->fill with NOPs.

Finally, right click again and select copy to executable->all modifications. Then click "copy all" and rightclick in the new window that pops up to save the file. If we use a patched linker to recompile the same program, we see the extra byes are gone.

The only difference are ofcourse e_lfanew, TimeDateStamp and SizeOfHeaders.

In the next post, we talk about The data directory.

Vedant Bhatt

Thursday, February 27, 2014

Dissecting the PE File Format - 2

No comments:

Post a Comment