Wednesday, February 26, 2014

Dissecting the PE File Format - 1

PE is the native Win32 file format. But, why know about it?
 - Adding code to executables
 - manually unpacking executables

In a packed executable, the import tables are destroyed and encrypted. The packer has code to unpack the code and it then jumps to the original entry point. If we manage to dump the memory region after the packer finished unpacking the executable, we still need to fix sections and import tables before our app runs. This cannot be done without knowing about the PE format. 


At a minimum the PE file has 2 section - code & data

There are 9 predefined sections - .text (executable section), .bss (data section), .data(data section), .rdata(data section), .rsrc(Resources), .edata(Export data), .idata(import data), .pdata  and .debug.

The names are largely irrelevant and only to assist the programmer.
The structure of the PE file is the same on disk and on memory. However, it is not copied exactly as is into memory. The windows loader decides which parts need mapping in and omits others. Data not mapped is placed at the end of the file past any parts that are mapped i.e. debug information.

The location of the file on disk is different from the location its loaded on memory because of the page-based virtual memory management that windows uses. When the sections are loaded into RAM, they are aligned to fit to 4kb memory pages, each section starting on a new page.

Virtual memory can be considered as an invisible layer between the software and the physical memory. Every time an attempt is made to access memory, a page table is consulted which indicates which physical memory address to use. Its not practical to have a page table entry for every byte of memory, so the processor divides memory into pages. This has several advantages:

  • enables creation of multiple address spaces. An address space is an isolated page table that only allows access to memory pertinent to the current program or process. It helps to keep the programs isolated and ensures that they dont infect each other.
  • enables the programmer to enforce certain rules on how memory is used. At load time, the memory manager sets access rights on memory pages for different sections based on its settings. This determines whether a given section is readable, writeable or executable. This means that each section starts on a fresh page.
    The default page size for Windows is 1000h and its wasteful to align executables to a 4kb page boundary on disk. Hence, the PE header has 2 different alignment fields - section alignment and file alignment. Section alignment is how sections are aligned in memory. File alignment  (200h) is how sections are aligned in the file on disk and its a multiple of disk sector size to optimize the loading process.
  • enables a paging file to be used on the hard drive to temporarily store pages from physical memory whilst not in use. An app, if idle, can be paged out to disk to make room for another app which needs to be loaded into RAM.
When PE files are loaded into memory, the in-memory version is known as the module.  The starting address where the file-mapping begins is called the HMODULE.  A module in memory represents all the code, data and resources from an executable file that is needed for execution, whilst the term process basiclaly refers to an isolated address space which can be used for running such a module.

THE DOS HEADER
At the start of a PE file, you have 64 bytes of the DOS header.  If a program is run in DOS, the DOS stub located immediately after the header is run. The DOS stub generally just prints a string like "This program cannot be run in DOS mode". When building an application, the linker links a default stub program called WINSTUB.EXE into your executable. This can be replaced using the -STUB linker option.
The DOS Header has 19 members of which magic and lfanew are of interest.
Magic part = 4Dh, 5Ah which signify a vlid DOS header and they translate to MZ
lfanew = DWORD at the end of the DOS header, just before the stub. It contains an offset to the PE header. The windows loader looks for this offset so it can skip the DOS stub and go directly to the PE header.

The last 4 bytes of the DOS header int he figure read 00 00 01 00 which is the offset to the PE header. The PE header begins with the signature 50 45 00 00 (letters "PE")
NE = 16-bit Windows New Executable
LE = Windows 3.x virtual device driver
LX = OS2/2.0

The PE header will be continued in the next post.

No comments:

Post a Comment