Sunday, August 30, 2015

XEN ParaVirtualization support in Rekall

If you've ever taken a memory image of a Linux virtual machine that's running under XEN in paravirtualization mode and you've tried to analyze it you'll have noticed most of your plugins don't work (if any).

[1] XEN-testmachine-PVguest.mem 14:20:42> pslist
ERROR:rekall.1:No profiles match this image. Try specifying manually.
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[...]
RuntimeError: Unable to find a valid profile for this image. Try using -v for more details.


The reason is that XEN's page tables are funky. XEN uses a technique known as direct mapping which significantly differs from how memory management is done in many other virtualization solutions.

"In order to virtualise the memory subsystem all hypervisors introduce an additional level of abstraction between what the guest sees as physical memory (often called pseudo-physical in Xen) and the underlying memory of the machine (machine addresses in Xen). This is usually done through the introduction of a Physical to Machine (P2M) mapping. Typically this would be maintained within the hypervisor and hidden from the guest Operating System through techniques such as the use of Shadow Page Tables."

So instead of trapping every single memory access from the guest and having the hypervisor do an additional translation, XEN uses a different model:
"The Xen paravirtualised MMU model instead requires that the guest be aware of the P2M mapping and be modified such that instead of writing page table entries mapping virtual addresses to the (pseudo-)physical address space it would instead write entries mapping virtual addresses directly to the machine address space by mapping performing the mapping from pseudo physical to machine addresses itself using the P2M as it writes its page tables. This technique is known in Xen as direct paging."


What direct mapping means is that instead of having page tables where each entry points to physical memory of the guest, XEN's page tables point to the physical memory of the host.


But, why does XEN do this? The answer is performance.
Because the guest kernel knows it's virtualized and can cooperate with the hypervisor, XEN can afford cheating in virtualization. Knowing the presence of and cooperating with the hypervisor is one of the strong points of paravirtualization. It requires modification of the guest kernel but provides opportunities for speed improvements. 

The guest kernel is allowed to know and use the host's physical memory for page table upkeeping. In exchange, it has to perform the upkeep of the page tables in cooperation with the host. This way no heavy translation mechanism like shadow pages tables must be used.

However, you can imagine that tinkering with the hosts's memory may allow a hostile kernel to subvert the host. XEN keeps the guest kernel in control because it runs in ring-3 and it is not allowed to directly write into the host memory. Instead, it must issue a hypercall, a call to the hypervisor, whenever it needs to update them. The hypervisor performs sanity checks to make sure the guest kernel request to modify page tables is not attempting to subvert the hypervisor itself or other VMs.

Throughout the article we'll be using XEN's terminology:
  • XEN refers to the physical memory of the host as machine memory.
  • The physical memory of the guest is called "pseudo-physical" or simply "physical" memory. To prevent confusion, we'll refer to guest physical memory as (pseudo-)physical.

XEN's direct mapping vs normal mapping

Let's deep dive into direct mapping. We'll compare how the page tables look in AMD64 vs how they look in a XEN guest. The host machine is 64 bits, runs the 3.13 kernel and has 32GB of ram, the guest has 256MB and runs kernel 3.2. All examples will be in the Rekall console.

Let's start by using a kernel symbol on a real machine and look how the MMU would translate it. We've picked linux_proc_banner, which is a kernel symbol that points to a format string (%s version %s) for the kernel version. It's a symbol present in all kernels and is what helps build the string in /proc/version.

To validate the expected address, we'll use the PAGE_OFFSET trick:

[1] kcore 13:03:23> hex(session.profile.get_constant("linux_proc_banner"))
             Out<6> '0xffffffff81800040L'

[1] kcore 13:03:46> hex(session.profile.get_constant("linux_proc_banner") - session.profile.GetPageOffset())
             Out<8> '0x1800040L'

linux_proc_banner is in 0xffffffff81800040 which corresponds to physical address 0x1800040L.


Despite this trick, while the system is running any virtual address translation is still done by the MMU. Let's use the vtop plugin to show the virtual address resolution just like the MMU would do. vtop outputs debug information about every step of the 2 or 3 level address translation:

[1] kcore 13:03:50> vtop(0xffffffff81800040L)

*************** 0xffffffff81800040 ***************
Virtual 0xffffffff81800040 Page Directory 0x1c0e000  (the DTB)
pml4e@ 0x1c0eff8 = 0x1c11067                         (1st step translation)
pdpte@ 0x1c11ff0 = 0x1c12063                         (2nd step translation)
pde@ 0x1c12060 = 0x80000000018001e1                  (3rd step translation)
Large page mapped
Physical Address 0x1800040                           (4th step, actual physaddr)

Deriving physical address from runtime physical address space:
Physical Address 0x1800040

vtop shows us that the physical address indeed 0x18000040 as we predicted earlier. And also that each of the different steps of the translation are within physical memory (0x1c1XXXX is around 29M).

Note that the value of the pde in this case (0x80000000018001e1) is not an actual physical address. The physical address part of the pde value are only the following bytes 0x80000000018001e1.

Page tables in XEN


So how does this look in a XEN guest? We're going to do all this for a 3.2 XEN kernel with a Rekall version without XEN support (I disabled it on purpose). For starters, let's list the available physical memory ranges:

[1] XEN-testmachine-PVguest.mem 11:10:09> for (pa, va, sz) in session.physical_address_space.get_address_ranges():
        print "%s - %s" % (hex(pa), hex(pa+sz))
                                      |.> 
0x0 - 0x10000000

If we are to inspect this image, DTB validation will most likely fail, but we can do the validation ourselves.
In AMD64, the DTB is identified by the symbol init_level4_pgt.

[1] XEN-testmachine-PVguest.mem 11:10:32> hex(session.profile.get_constant("init_level4_pgt"))
                                   Out<8> '0xffffffff81c05000L'

[1] XEN-testmachine-PVguest.mem 11:10:50> hex(session.profile.get_constant("init_level4_pgt") - session.profile.GetPageOffset())
                                   Out<9> '0x1c05000L'

The DTB should be 0x1c05000. Now, let's get the symbol for linux_proc_banner for this image:

[1] XEN-testmachine-PVguest.mem 11:11:16> hex(session.profile.get_constant("linux_proc_banner"))
                                  Out<10> '0xffffffff81800020L'

Meaning its physical address should end up being 0x18000e0. Let's try to validate this:

[1] XEN-testmachine-PVguest.mem 11:11:37> session.physical_address_space.read(0x1800020, 14)
                                  Out<12> '%s version %s '

Perfect!

Now, we have to see what the MMU would see. That is, what the translation following the DTB at 0x1c05000 would say:

[1] XEN-testmachine-PVguest.mem 11:12:06> from rekall.plugins.addrspaces import amd64
[1] XEN-testmachine-PVguest.mem 11:12:09> adrs = amd64.AMD64PagedMemory(dtb=0x1c05000, session=session, base=session.physical_address_space)
[1] XEN-testmachine-PVguest.mem 11:12:10> t = session.GetRenderer()
[1] XEN-testmachine-PVguest.mem 11:12:21> with t:
    for x in adrs.describe_vtop(0xffffffff81800020L):
        x.render(t)
                                     |..> 
pml4e@ 0x1c05ff8 = 0x28417067
pdpte@ 0x28417ff0 = 0x0
Invalid PDPTE

pde@ 0x60 = 0x0


As you can see, at the first translation returns a physical address that's not in the physical memory range. 0x28417067  is 6.7GB. Way above 256M.

When we try to read from an address so high "pdpte@ 0x153ec19ff0" returns 0. It's not that there's a 0 in that address, it's something Rekall does when the you try to read from an invalid address.

0x28417067 is, in fact, pointing to the host's physical memory. That is, this is a machine address.
However, we only have access to (pseudo-)physical memory, the physical memory of the guest.

Here's what's happening:


Because the page tables point to the host's physical memory, we can't resolve them directly from the page tables as usual.

So how do we solve this?

Enter XEN translation


As was mentioned at the start, XEN maintains a mapping of (Pseudo-)Physical to Machine addresses called a P2M mapping.

What we need is a machine to (pseudo-)physical mapping. Or M2P.

Both the P2M mapping and M2P mapping are pointed by symbols accessible from the guest kernel. However, there's an important difference. Let's take a look at them:

[1] XEN-testmachine-PVguest.mem 11:13:35> hex(session.profile.get_constant("p2m_top"))
                                  Out<21> '0xffffffff81ddede8L'

p2m_top is a pointer. Let's see where it points to:

[1] XEN-testmachine-PVguest.mem 11:14:44> hex(0xffffffff81d41de8L - session.profile.GetPageOffset())
                                  Out<23> '0x1d41de8L'           (physical address of p2m_top)
[1] XEN-testmachine-PVguest.mem 11:15:18> hex(struct.unpack_from("<Q", session.physical_address_space.read(0x1ddede8L, 8))[0])
                                  Out<27> '0xffffffff81f2e000L'  (where p2m_top points to)

p2m_top points to 0xffffffff81f2e000. It corrresponds to 0x1e53000 in physical memory, which is within pseudo physical memory, so we can reach it. Let's see about machine_to_phys_mapping (M2P):

[1] XEN-testmachine-PVguest.mem 12:34:06> hex(session.profile.get_constant("machine_to_phys_mapping"))
                                   Out<2> '0xffffffff81c0f808L'

machine_to_phys_mapping is also a pointer. Let's see where it points to:

[1] XEN-testmachine-PVguest.mem 12:34:07> hex(session.profile.get_constant("machine_to_phys_mapping") - session.profile.GetPageOffset())
                                   Out<3> '0x1c0f808L'
[1] XEN-testmachine-PVguest.mem 12:35:10> hex(struct.unpack_from("<Q", session.physical_address_space.read(0x1c0f808L, 8))[0])
                                   Out<7> '0xffff800000000000L'

Ok, machine_to_phys_mapping is a pointer to 0xffff800000000000. Well, that's a problem:
  1. You can't translate this address back to physical memory via the PAGE_OFFSET method.
  2. You can't translate it directly via the MMU:

[1] XEN-testmachine-PVguest.mem 12:35:11> t = session.GetRenderer()
[1] XEN-testmachine-PVguest.mem 12:39:09> from rekall.plugins.addrspaces import amd64
[1] XEN-testmachine-PVguest.mem 12:39:13> adrs = amd64.AMD64PagedMemory(dtb=0x1c05000, session=session, base=session.physical_address_space)
[1] XEN-testmachine-PVguest.mem 12:39:16> with t:
    for x in adrs.describe_vtop(0xffff800000000000L):
        x.render(t)
                                     |..> 
pml4e@ 0x1c05800 = 0x7fff9067      (outside the guests's phys memory range :( )
pdpte@ 0x7fff9000 = 0x0
Invalid PDPTE
pde@ 0x0 = 0x0

So how can you access machine_to_phys_mapping? You need to do it on the live system, from the kernel. But there's 2 problems with the current capabilities:

  1. When analyzing live memory, /dev/pmem only maps physical memory, so we can't ask the kernel to read from this address.
  2. When analyzing live memory, /proc/kcore doesn't map it either 

We have 2 options:
  1. Provide a driver that allows reading of virtual addresses. A /dev/vmem of sorts.
  2. Use P2M, which we can access, and try inverting it. After all, M2P should be a reverse P2M.
Since solution [1] means we have to compile a new module for every new kernel version and insert it and we like solutions that work universally out of the box, we explored [2].

Inverting P2M

As explained in arch/x86/xen/p2m.c, p2m_top is essentially a 3-level tree to perform (pseudo-)physical to machine address resolution. Very much like virtual to physical translation works in x86, p2m_top translates (pseudo-)physical addresses to machine addresses. What we're looking for is the reverse: a machine to guest physical translation, or M2P.

p2m_top takes a PFN, a (pseudo-)physical frame number and translates it to an MFN, a machine frame number. So we experimented with code to parse it and here's some debug information for a single resolution of PFN(0) to MFN:

p2m_top[0] = 0xffffffff81f30000
p2m_top[0][0] = 0xffffffff84673000
p2m_top[0][0][0] = 0x7f328

What this means is that the guest's address range 0 - 0x1000 corresponds to the hosts' 0x7f328000 - 0x7f328FFF range. If we reverse this: that whenever we have a reference to host addresses in the 0x7f328000 - 0x7f328FFF range, we can find them on the guest's physical memory starting at offset 0. So if we ever find a reference in the page tables to 0x7f328000, now we know where to find it.

As you can see, we were able to follow the p2m_top tree completely for the first entry. All addresses were within addressable space in the guest kernel memory.

And, in fact, if we repeat this for the rest of the tree and invert it we can now resolve all machine to (pseudo-)physical translations.

Now, remember the vtop we did earlier?

[1] XEN-testmachine-PVguest.mem 14:04:41> with t:
        for x in adrs.describe_vtop(0xffffffff81800020L):
                x.render(t)
                                     |..>

pml4e@ 0x1c05ff8 = 0x28417067
pdpte@ 0x28417ff0 = 0x0
Invalid PDPTE

pde@ 0x60 = 0x0


We didn't know where to fetch 0x28417067 (6.5GB) as it's past the guest physical memory (remember, it's a machine address pointing to the host). However, once we've parsed the P2M and reverted it, we now know how to convert it back to a physical address:

[1] XEN-testmachine-PVguest.mem 14:04:46> hex(session.kernel_address_space.m2p_mapping[0x28417])
                                  Out<15> 0x1c07

This means machine frame number 0x28417 corresponds to physical frame number 7175 (0x1c07).

In plain words, whenever the page tables refer to an address between 0x28417000 and 0x28417FFF we can find its actually backed in the guest's physical memory between 0x1c07000 and 0x1c07FFF0x1c07000 is 29MB which is well within the guest's physical memory, as it should.

Once we implemented detection of XEN ParaVirtualization and automatic address translation after parsing of the P2M tables, we were able to examine XEN guests transparently from within the guest by just accessing it's (pseudo-) physical memory. Which means this approach also works when doing live analysis.

And with the current implementation in the Rekall HEAD, what you'll see instead is:

[1] XEN-testmachine-PVguest.mem 16:22:36> session.kernel_address_space
                                   Out<3> <XenParaVirtAMD64PagedMemory @ 0x7fab1dc5c49 Kernel AS@0x1c05000>
[1] XEN-testmachine-PVguest.mem 16:22:34> vtop(0xffffffff81800020)

*************** 0xffffffff81800020 ***************
Virtual 0xffffffff81800020 Page Directory 0x1c05000

(XEN resolves MFN 0x28417067 to PFN 0x1c07067)
pml4e@ 0x1c05ff8 = 0x1c07067

(XEN resolves MFN 0x28413067 to PFN 0x1c0b067)
pdpte@ 0x1c07ff0 = 0x1c0b067
pde@ 0x1c0b060 = 0x4705067
pte@ 0x4705000 = 0x1800025
Physical Address 0x1800020

Deriving physical address from runtime physical address space:
Physical Address 0x1800020

[1] XEN-testmachine-PVguest.mem 16:22:34> pslist

  Offset (V)      Name      PID    PPID   UID    GID        DTB             Start Time
-------------- ----------- ------ ------ ------ ------ -------------- ----------------------
0x88000ef10000 init           1      0      0      0 0x00000ba56000 2015-01-30 12:10:24+0000
0x88000ef11700 kthreadd       2      0      0      0 -              2015-01-30 12:10:24+0000
0x88000ef12e00 ksoftirqd/0    3      2      0      0 -              2015-01-30 12:10:24+0000
0x88000ef14500 kworker/0:0    4      2      0      0 -              2015-01-30 12:10:24+0000
0x88000ef15c00 kworker/u:0    5      2      0      0 -              2015-01-30 12:10:24+0000
0x88000ef30000 migration/0    6      2      0      0 -              2015-01-30 12:10:24+0000
0x88000ef31700 watchdog/0     7      2      0      0 -              2015-01-30 12:10:24+0000
[...]

We added initial support for analyzing XEN paravirtualized guests earlier in 2015 and we've been improving and refining it.You can see XEN support in action in our TAP server (xen image). An example: pslist.

Conclusions

In this blog post, we've presented the challenges found in analyzing virtual machines running under XEN paravirtualized model.

We also discussed two methods to overcome them and how we implemented support for 64bits XEN paravirtualization in Rekall.

Over time we've discovered some issues with our approach, specially with later kernels in the 3.X branch that I'll discuss in a follow-up article.

Let us know if you've used this functionality with success and want to give us thumbs up or if you've encountered problems, so we can help you solve them. Note that we don't support XEN PV for the 2.6.X branch at the moment.

Friday, August 14, 2015

Windows Virtual Address Translation - Part 2.

We have previously discussed the Windows address translation mechanism back in 2014. As far as we know, Rekall is still the only memory forensic tool that actually performs accurate address translation. In this post we examine some of the new features in the latest Rekall release supporting advanced address translation, and how this is used in practice.
I recently attended The Twentieth IEEE Symposium on Computers and Communications conference where we presented our paper titled Forensic Analysis of Windows User space Applications through Heap allocations. The paper covers the work we did in Rekall in researching the Windows address translation algorithm, and the Microsoft heap implementation. (Both these topics were previously also discussed on this blog).
The paper is quite large and covers a lot of ground. In this blog post we will focus on the first part, namely the address translation process. In a future blog post we will discuss the second part of the paper, namely using heap allocations for reverse engineering.
Since our original blog posts, we have discovered several cases which were not covered by the original research. It seems that the Windows address translation process is quite complex and subtle. In order to properly support the full algorithm we have rewritten the address translation code in Rekall from scratch. The new implementation has some interesting features:
  1. The implementation balances provenance with efficiency - It is always possible to query Rekall about how it arrived at a particular result. This is important when implementing complex address translation algorithms. You can inspect the address translation process, step by step, using the vtop() plugin.
  2. The new implementation is able to map files into the physical address space. This makes them available to the rest of Rekall transparently. For example, the pagefile may be mapped into the physical address space at a particular offset, then a read() operation on the physical address space will actually end up reading from the pagefile. Rekall’s address translation process therefore only need return an offset into the physical address space at the file’s mapping.
This second point is actually very cool as it can be used to map memory mapped files into the physical address space too. When a file is mapped into memory, the PTE corresponding to the virtual page may be pointing to a _SUBSECTION struct. In practice on a running system, if an application tries to access this virtual address, a page fault will occur and Windows will read the file into a physical page, on demand. Unfortunately, for memory forensics analysts it is impossible to recover this data from an image of physical memory - since the data is not actually in physical memory. So previously the best we could do, was to show that this virtual address is a subsection PTE and where the data would be coming from (e.g. filename and offset inside the file).
With the new address translation code, it is possible for Rekall to resolve this if it can find the file itself. This requires that the mapped file be acquired together with the physical memory, but once this is done, Rekall will transparently map the file into the physical address space and serve read() requests from it. Here is an example:
[1] test.aff4 09:15:28> vtop 0x00013fb91000

****************** 0x13fb91000 ******************
Virtual 0x00013fb91000 Page Directory 0x2e142000
pml4e@ 0x2e142000 = 0x10f000002e85f867
pdpte@ 0x2e85f020 = 0x3b000002ebe0867
pde@ 0x2ebe0fe8 = 0x2d0000013821867
pte@ 0x13821c88 = 0xf8a001ca40600400
[_MMPTE_PROTOTYPE Proto] @ 0x000013821c88
Offset             Field              Content
------ ------------------------------ -------
  0x-1   Proto                          <_MMPTE Pointer to [0xF8A001CA4060] (Pointer)>
  0x0    Protection                      [Enumeration:Enumeration]: 0x00000000 (MM_ZERO_ACCESS)
  0x0    ProtoAddress                    [BitField(16-64):ProtoAddress]: 0xF8A001CA4060
  0x0    Prototype                       [BitField(10-11):Prototype]: 0x00000001
  0x0    ReadOnly                        [BitField(8-9):ReadOnly]: 0x00000000
  0x0    Unused0                         [BitField(1-8):Unused0]: 0x00000000
  0x0    Unused1                         [BitField(9-10):Unused1]: 0x00000000
  0x0    Valid                           [BitField(0-1):Valid]: 0x00000000
[_MMPTE_SUBSECTION Subsect] @ 0xf8a001ca4060
Offset             Field              Content
------ ------------------------------ -------
  0x-1   Subsection                     <_SUBSECTION Pointer to [0xFA8002A52EA8] (Pointer)>
  0x0    Protection                      [Enumeration:Enumeration]: 0x00000003 (MM_EXECUTE_READ)
  0x0    Prototype                       [BitField(10-11):Prototype]: 0x00000001
  0x0    SubsectionAddress               [BitField(16-64):SubsectionAddress]: 0xFA8002A52EA8
  0x0    Unused0                         [BitField(1-5):Unused0]: 0x00000000
  0x0    Unused1                         [BitField(11-16):Unused1]: 0x00000000
  0x0    Valid                           [BitField(0-1):Valid]: 0x00000000
Subsection PTE to file C:\Windows\System32\VBoxTray.exe @ 0x400
Physical Address 0x400 @ aff4://dea18f67-b60c-495f-9f23-ff3f2eeaf30b/C%3A%5CWindows%5CSystem32%5CVBoxTray.exe (Mapped 0x406eb5a4)

Deriving physical address from runtime physical address space:
Physical Address 0x400 @ aff4://dea18f67-b60c-495f-9f23-ff3f2eeaf30b/C%3A%5CWindows%5CSystem32%5CVBoxTray.exe (Mapped 0x406eb5a4)
In this example, the hardware PTE is recognized as a Prototype PTE (i.e. it is a symlink to the real PTE). The real PTE is, however, a _MMPT_SUBSECTION PTE which means it is simply a placeholder pointing at a _SUBSECTION structure which manages a mapping to the fileC:\Windows\System32\VBoxTray.exe.
In this case, however, Rekall has the actual file in the AFF4 volume. It therefore can map it into the physical address space. A read() request will recover the relevant data directly from the mapped file!
The vadmap plugin enumerates the state of each page in a process’s address space. This is very useful to see an overview of how pages are arranged in the process virtual address space. For example, examining the VBoxTray.exe process:
[1] test.aff4 09:28:56> vadmap 2084, start=0x00013fb90000
**************************************************
Pid: 2084 VBoxTray.exe
  Virt Addr        Length             Type         Comments
-------------- -------------- -------------------- --------
0x00013fb90000         0x1000 Valid                PhysAS @ 0x18f2e000
0x00013fb91000         0x1000 File Mapping         C:\Windows\System32\VBoxTray.exe @ 0x400 (P)
0x00013fb92000         0x1000 Valid                PhysAS @ 0x2ea47000
0x00013fb93000         0x1000 File Mapping         C:\Windows\System32\VBoxTray.exe @ 0x2400 (P)
0x00013fb94000         0x1000 Transition           PhysAS @ 0x31086000 (P)
0x00013fb95000         0x1000 File Mapping         C:\Windows\System32\VBoxTray.exe @ 0x4400 (P)
0x00013fb96000         0x8000 Valid                PhysAS @ 0x543b000
0x00013fb9e000         0x1000 File Mapping         C:\Windows\System32\VBoxTray.exe @ 0xd400 (P)
0x00013fb9f000         0x2000 Valid                PhysAS @ 0x2e65d000
0x00013fba1000         0x1000 File Mapping         C:\Windows\System32\VBoxTray.exe @ 0x10400 (P)
0x00013fba2000         0x1000 Valid                PhysAS @ 0x2e820000
If we wanted to dump the executable from memory, previously, the dumpfiles plugin would dump the pages in the "Valid" or "Transition" state, but would have to zero pad the pages in "File Mapping" state (since the data was not available). However, now that Rekall can map the acquired executable from disk into the gaps, the dumped executable is kind of a combination of some pages from disk, and some pages from memory. This is especially important if malware manipulates the code in memory (e.g. installing detour hooks or other code modification) which are not present on disk. What we get now is the overlay of memory with the disk as it is visible to the running system.

Live Analysis

The example above demonstrates how this works with an AFF4 image (once all mapped files have been captured). But the new address transition mechanism works just as well with live analysis using the WinPmem memory acquisition driver. In this case, Rekall is able to directly open any mapped files on demand - and even parse the NTFS on the live system in order to recover locked files.
For example, consider the following (swapper.exe) program which maps "notepad.exe" for reading (it is not actually running notepad, it is only mapped into the address space) and then read some bytes from the third page. This causes some of the pages to be faulted in but many of the mapped pages remain as Subsection PTEs.
char *create_file_mapping() {
    TCHAR *filename = L"c:\\windows\\notepad.exe";
    HANDLE h = CreateFile(filename, GENERIC_READ,FILE_SHARE_READ,NULL,OPEN_EXISTING,
                          FILE_FLAG_SEQUENTIAL_SCAN,NULL);

    DWORD size = GetFileSize(h, NULL);
    HANDLE hFileMapping = CreateFileMapping(h, NULL,PAGE_READONLY, 0, 0, NULL);
    if (h=INVALID_HANDLE_VALUE) {
       LogLastError();
    };

    char *view = (char*) MapViewOfFileEx(hFileMapping, FILE_MAP_READ, 0,  0,0,NULL);
    if (!view) {
        LogLastError();
    };

    // Read the third page of the file mapping.
    view += 0x1000 * 3;
    printf("Contents of %p %s\n", view, view);

    return view;
}
Lets examine what it looks like in the vad output (a little truncated for briefness):
[1] pmem 21:08:05> vad 2668
Pid: 2668 swapper.exe
     VAD       lev   Start Addr      End Addr     com   ------- ------       Protect        Filename
-------------- --- -------------- -------------- ------                -------------------- --------
......
0xfa80027775c0   5 0x000000300000 0x00000030ffff      8 Private        READWRITE
0xfa800262b2e0   6 0x000000310000 0x00000033ffff      0 Mapped         READONLY             \Windows\notepad.exe
0xfa8002d42170   4 0x000000370000 0x0000003effff      6 Private        READWRITE
.....

[1] pmem 21:08:08> vadmap 2668, start=0x000000310000
**************************************************
Pid: 2668 swapper.exe
  Virt Addr        Length             Type         Comments
-------------- -------------- -------------------- --------
0x000000310000         0x3000 File Mapping         C:\Windows\notepad.exe (P)
0x000000313000         0x1000 Valid                PhysAS @ 0x20b89000
0x000000314000         0x7000 Transition           PhysAS @ 0x3218a000 (P)
0x00000031b000        0x25000 File Mapping         C:\Windows\notepad.exe @ 0xb000 (P)
0x000000370000         0x6000 Valid                PhysAS @ 0x1e8bc000
0x000000376000        0x7a000 Demand Zero
0x0000005a0000         0x6000 Valid                PhysAS @ 0x2d057000
As we can see the first 3 pages are merely mapped (i.e. not read), the next 8 pages are read into memory and the rest of the file is also not read but mapped in. Let us examine the first page of the mapped file in details:
[1] pmem 21:11:33> vtop 0x000000310000

******************** 0x310000 ********************
Virtual 0x000000310000 Page Directory 0x1b1da000
pml4e@ 0x1b1da000 = 0x700000297bd867668)
pdpte@ 0x297bd000 = 0xb00000054c1867
pde@ 0x54c1008 = 0x8c00000229bb867
pte@ 0x229bb880 = 0x0
[_MMPTE_SOFTWARE Soft] @ 0x0000229bb880
Offset             Field              Content
------ ------------------------------ -------
  0x0    InStore                         [BitField(22-23):InStore]: 0x00000000
  0x0    PageFileHigh                    [BitField(32-64):PageFileHigh]: 0x00000000
  0x0    PageFileLow                     [BitField(1-5):PageFileLow]: 0x00000000
  0x0    Protection                      [Enumeration:Enumeration]: 0x00000000 (MM_ZERO_ACCESS)
  0x0    Prototype                       [BitField(10-11):Prototype]: 0x00000000
  0x0    Reserved                        [BitField(23-32):Reserved]: 0x00000000
  0x0    Transition                      [BitField(11-12):Transition]: 0x00000000
  0x0    UsedPageTableEntries            [BitField(12-22):UsedPageTableEntries]: 0x00000000
  0x0    Valid                           [BitField(0-1):Valid]: 0x00000000
Consulting Vad: Prototype PTE is found in VAD
**************************************************
Pid: 2668 swapper.exe
     VAD       lev   Start Addr      End Addr     com   ------- ------       Protect        Filename
-------------- --- -------------- -------------- ------                -------------------- --------
0xfa800262b2e0   6 0x000000310000 0x00000033ffff      0 Mapped         READONLY             \Windows\notepad.exe

_MMVAD.FirstPrototypePte: 0xf8a000ce6820
Prototype PTE is at virtual address 0xf8a000ce6820 (Physical Address 0x18540820)
[_MMPTE_SUBSECTION Subsect] @ 0xf8a000ce6820
Offset             Field              Content
------ ------------------------------ -------
  0x-1   Subsection                     <_SUBSECTION Pointer to [0xFA8000E032E0] (Pointer)>
  0x0    Protection                      [Enumeration:Enumeration]: 0x00000006 (MM_EXECUTE_READWRITE)
  0x0    Prototype                       [BitField(10-11):Prototype]: 0x00000001
  0x0    SubsectionAddress               [BitField(16-64):SubsectionAddress]: 0xFA8000E032E0
  0x0    Unused0                         [BitField(1-5):Unused0]: 0x00000000
  0x0    Unused1                         [BitField(11-16):Unused1]: 0x00000000
  0x0    Valid                           [BitField(0-1):Valid]: 0x00000000
Subsection PTE to file C:\Windows\notepad.exe @ 0x0
Physical Address 0x547f4000

Deriving physical address from runtime physical address space:
Physical Address 0x547f4000
Despite the PTE only referring to the mapped page, Rekall can find the file on disk (Rekall maps a view of the file into the physical address space), and so now if we use the dump plugin to view a hexdump of that first page we can see the familiar MZ PE file header. It must be stressed that this data is not in memory at all: Rekall has recovered it from the disk itself - on demand, using live analysis.
[1] pmem 21:11:46> dump 0x000000310000
DEBUG:rekall.1:Running plugin (dump) with args ((3211264,)) kwargs ({})
    Offset                                   Data                                
-------------- ----------------------------------------------------------------- ---------------
      0x310000 4d 5a 90 00 03 00 00 00 04 00 00 00 ff ff 00 00  MZ.............. \Windows\notepad.exe
      0x310010 b8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00  ........@.......
      0x310020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      0x310030 00 00 00 00 00 00 00 00 00 00 00 00 e8 00 00 00  ................
      0x310040 0e 1f ba 0e 00 b4 09 cd 21 b8 01 4c cd 21 54 68  ........!..L.!Th
      0x310050 69 73 20 70 72 6f 67 72 61 6d 20 63 61 6e 6e 6f  is.program.canno
Rekall can similarly use the pagefile on a live system too. In that case Rekall reads the page file using the raw NTFS support - bypassing the OS APIs (which normally lock the pagefile while the system is running).

Test images

We did not reverse engineer any code in order to research the Windows address translation process. Instead we created a test program that generated known patterns of user space memory. We then ran the program and acquired the image. Our goal was to have Rekall reconstruct the known memory pattern, as a test of Rekall’s efficacy. The program was previously already published here, so readers can repeat this test on their own.
To make it even easier to independently verify and discuss the Windows address translation process, we are now making a reference image available here. In this blog post we will examine theswapper_test_paged_pde.aff4 image in details. Readers can replicate the analysis using at least Rekall 1.4 (Etzel). We also hope that readers can use these images to test and evaluate other memory analysis tools. Tool testing and verification can only improve the general state of memory analysis tools.

Forensic provenance

Since we introduced complex, OS specific address translation to Rekall, there was a need to explain the address translation process in detail. This improved forensic provenance and assists users in really understanding what Rekall is doing under the covers. We added the vtop plugin to Rekall for this purpose. To use this plugin, the users can switch first into the desired process context, and then run the vtop plugin on a specific virtual address:
[1] swapper_test_paged_pde.aff4 17:24:41> cc proc_regex="swap"
Switching to process context: swapper.exe (Pid 2236@0xfa8000f47270)

[1] swapper_test_paged_pde.aff4 17:24:44> vtop 0x000074770000

******************* 0x74770000 *******************
Virtual 0x000074770000 Page Directory 0x33a5a000
pml4e@ 0x33a5a000 = 0x2a00000383a9867
pdpte@ 0x383a9008 = 0x1500000384b0867
pde@ 0x384b0d18 = 0x117000003369a867
pte@ 0x3369ab80 = 0xf8a001b759280400
[_MMPTE_PROTOTYPE Proto] @ 0x00003369ab80
Offset             Field              Content
------ ------------------------------ -------
  0x-1   Proto                          <_MMPTE Pointer to [0xF8A001B75928] (Pointer)>
  0x0    Protection                      [Enumeration:Enumeration]: 0x00000000 (MM_ZERO_ACCESS)
  0x0    ProtoAddress                    [BitField(16-64):ProtoAddress]: 0xF8A001B75928
  0x0    Prototype                       [BitField(10-11):Prototype]: 0x00000001
  0x0    ReadOnly                        [BitField(8-9):ReadOnly]: 0x00000000
  0x0    Unused0                         [BitField(1-8):Unused0]: 0x00000000
  0x0    Unused1                         [BitField(9-10):Unused1]: 0x00000000
  0x0    Valid                           [BitField(0-1):Valid]: 0x00000000
[_MMPTE_SUBSECTION Subsect] @ 0xf8a001b75928
Offset             Field              Content
------ ------------------------------ -------
  0x-1   Subsection                     <_SUBSECTION Pointer to [0xFA8000F75090] (Pointer)>
  0x0    Protection                      [Enumeration:Enumeration]: 0x00000001 (MM_READONLY)
  0x0    Prototype                       [BitField(10-11):Prototype]: 0x00000001
  0x0    SubsectionAddress               [BitField(16-64):SubsectionAddress]: 0xFA8000F75090
  0x0    Unused0                         [BitField(1-5):Unused0]: 0x00000000
  0x0    Unused1                         [BitField(11-16):Unused1]: 0x00000000
  0x0    Valid                           [BitField(0-1):Valid]: 0x00000000
Subsection PTE to file C:\Users\mic\msvcr100.dll @ 0x0
Consider the example above. We first switch to the process context of the process with the name matching "swap". We then can see that Rekall is translating the pml4epdpdtpde to arrive at the pte. The pte contains the value 0xf8a001b759280400 which Rekall identifies as being in the PROTOTYPE state (As described previously a prototype PTE is like a symlink to another PTE which describes the real state of this virtual address).
Rekall then prints the _MMPTE_PROTOTYPE record indicating that the real PTE is found in virtual address 0xF8A001B75928. Rekall then identifies that PTE as a Subsection PTE and prints its content (A Subsection PTE is a placeholder for file mappings). The _MMPTE_SUBSECTION has a pointer to the subsection object for this file mapping.
Finally, in this case, Rekall does not have the file itself, hence we can not retrieve the content of this virtual address (On a real system, accessing the virtual address will cause the page fault handler to read the file into memory).
That was a simple example. Lets look at a more complex example:
[1] swapper_test_paged_pde.aff4 17:41:12> vtop 0x000000600000
******************** 0x600000 ********************
Virtual 0x000000600000 Page Directory 0x33a5a000
pml4e@ 0x33a5a000 = 0x2a00000383a9867
pdpte@ 0x383a9000 = 0x2f0000038a6c867
pde@ 0x38a6c018 = 0x213ff00200080
[_MMPTE_SOFTWARE Soft] @ 0x000038a6c018
Offset             Field              Content
------ ------------------------------ -------
  0x0    InStore                         [BitField(22-23):InStore]: 0x00000000
  0x0    PageFileHigh                    [BitField(32-64):PageFileHigh]: 0x000213FF
  0x0    PageFileLow                     [BitField(1-5):PageFileLow]: 0x00000000
  0x0    Protection                      [Enumeration:Enumeration]: 0x00000004 (MM_READWRITE)
  0x0    Prototype                       [BitField(10-11):Prototype]: 0x00000000
  0x0    Reserved                        [BitField(23-32):Reserved]: 0x00000000
  0x0    Transition                      [BitField(11-12):Transition]: 0x00000000
  0x0    UsedPageTableEntries            [BitField(12-22):UsedPageTableEntries]: 0x00000200
  0x0    Valid                           [BitField(0-1):Valid]: 0x00000000
Pagefile (0) @ 0x213ff000
pte@ 0x213ff000 @ aff4://c7201492-0876-45f4-ba90-a7cccec6453d/c:/pagefile.sys (Mapped 0x613ff000) = 0x1cee00000080
[_MMPTE_SOFTWARE Soft] @ 0x0000613ff000
Offset             Field              Content
------ ------------------------------ -------
  0x0    InStore                         [BitField(22-23):InStore]: 0x00000000
  0x0    PageFileHigh                    [BitField(32-64):PageFileHigh]: 0x00001CEE
  0x0    PageFileLow                     [BitField(1-5):PageFileLow]: 0x00000000
  0x0    Protection                      [Enumeration:Enumeration]: 0x00000004 (MM_READWRITE)
  0x0    Prototype                       [BitField(10-11):Prototype]: 0x00000000
  0x0    Reserved                        [BitField(23-32):Reserved]: 0x00000000
  0x0    Transition                      [BitField(11-12):Transition]: 0x00000000
  0x0    UsedPageTableEntries            [BitField(12-22):UsedPageTableEntries]: 0x00000000
  0x0    Valid                           [BitField(0-1):Valid]: 0x00000000
Pagefile (0) @ 0x1cee000
Physical Address 0x1cee000 @ aff4://c7201492-0876-45f4-ba90-a7cccec6453d/c:/pagefile.sys (Mapped 0x41cee000)
In this example, Rekall identifies the PDE at physical address 0x38a6c018 contains 0x213ff00200080. Since the PDE does not have bit 0 set - it is not valid. However, Rekall identifies that the PTE table resides in the pagefile at offset 0x213ff000. Note how Rekall maps the pagefile into the physical address space - by mapping the pagefile into the physical address space, the address transition process can simply refer to it by a single physical offset.
Rekall then reads the value of the PTE (from the pagefile) and finds that it is 0x1cee00000080. This again refers to the pagefile, this time at address 0x1cee000.
Note that in the second example we consulted the pagefile twice - once for reading the PTE table (referenced by a paged out PDE) and once by resolving the actual PTE which also refers to the pagefile. Being able to see the full transition process at work is extremely useful. As forensic analysts we must justify how we arrive at our conclusions and the vtop plugin allows us to do this.

How important is this?

We were previously surprised that correct address transition has not been implemented by other tools, and in particular by the lack of tools that are able to use the pagefile during analysis. Additionally, other researchers theorized that smear will be a big problem - there is a reasonably long time difference between aquiring the memory and acquiring the pagefile itself. Even we have previously observed that page tables may change between the two times causing the physical memory to be out of sync with the pagefile (we describe this as pagetable smear in the paper).
We wanted to check how many pages from the known VAD region can be recovered with and without the pagefile. We use the Rekall vaddump plugin to dump all vad regions for the swapper.exe process. We can then test how many pages were as expected and how many were incorrect (possibly due to smear) using the following python script:
import sys
import struct

i = errors = success = 0
with open(sys.argv[1]) as fd:
     while 1:
        i += 1
        fd.seek(i * 0x1000)
        data = fd.read(8)
        if not data: break

        unpacked_data = struct.unpack("<Q", data)[0]
        if unpacked_data != i:
            errors += 1
        else:
            success += 1

print "Total errors: %s, Total success: %s" % (errors, success)
When using the pagefile, Rekall could correctly recover all but 3691 pages out of 202400 (error rate of 1.8%). However, without the pagefile, Rekall could only recover 119198 out of 202400 (41% error rate). We attribute most of the errors to acquisition smear in the case where the pagefile was used. However, this demonstrates that the pagefile is critical to collect and analyze - almost half the pages of interest were in the pagefile.

The AFF4 acquisition plugin

Our goal in acquisition is to preserve as much of system state as possible for later analysis. As we have seen, from the point of view of the address translation process, the system state comprises of:
  1. The physical memory.
  2. The pagefile.
  3. Any mapped files.
Previously, we used a dedicated imaging tool to acquire memory and the pagefile on the side. For example, the WinPmem 2.0.1 acquisition tool was written in C++ and acquired physical memory, while shelling out to the Sleuthkit’s fcat program to parse the NTFS file system when acquiring the pagefile (The pagefile is locked during normal system operation and can not be opened via the normal system APIs).
Quite independent of that, Rekall had for a long time the ability to perform live analysis: The raw physical memory device was used as a kind of memory image, and Rekall could perform triage live analysis without having to acquire memory first.
We have realized that in order to best preserve system state, especially on utilized systems, we should combine the two approaches to get a better copy of system state! Rekall can start to acquire the physical memory, then analyze the running system to determine which files are mapped and should be acquired additionally. Rekall now even parses the NTFS file system directly, and therefore can acquire locked files without using the OS APIs (There is no need to shell out to the Sleuthkit).
The final product is therefore an AFF4 volume containing physical memory as well as any mapped files and pagefile from the system. The AFF4 imaging format provides us with the required features, such as compression, sparse images (Physical memory is often sparse) and the ability to store multiple data streams in the same image format.
Now we can write a sophisticated memory acquisition tool right inside Rekall instead of having to rely on a dedicated imager. This is more powerful since we can leverage the triage and analysis capability in Rekall. It does come at a cost of increased complexity to the acquisition tool, and potentially increased footprint due to the larger executable size. However we believe this is a good trade off: Even if memory is forced to be swapped due to an increased footprint, we can just recover it from the pagefile anyway so we do not actually lose anything. We believe that when acquiring the pagefile and mapped files, demands on memory pressure and smaller tool footprint are less important. We will continue to maintain the old single program acquisition tool, which might be useful in some situations.
Consider acquiring memory now from the command line:
D:\AMD64>winpmem_2.0.1.exe -l
Driver Unloaded.
CR3: 0x0000187000
 2 memory ranges:
Start 0x00001000 - Length 0x0009E000
Start 0x00100000 - Length 0x3FEF0000
Memory access driver left loaded since you specified the -l flag.

D:\AMD64>rekal -f \\.\pmem aff4acquire c:\temp\image.aff4 --also_files
Will use compression: https://github.com/google/snappy
Imaging Physical Memory:
 WinPmemAddressSpace: Wrote 0xc000000 (200 mb total) (11 Mb/s)
...
Wrote 1023 mb of Physical Memory to aff4://6567a47c-dd5d-4060-9a39-399fe735d959/PhysicalMemory
Imaging pagefile C:\pagefile.sys
 pagefile.sys: Wrote 0x20b5a000 (548 total) (22 Mb/s)
Wrote pagefile.sys (1000 mb)
Adding file C:\Windows\System32\ntdll.dll
Adding file C:\Windows\SysWOW64\ntdll.dll
Adding file C:\Windows\System32\smss.exe
Adding file C:\Windows\System32\apisetschema.dll
Adding file C:\Windows\System32\locale.nls
Adding file C:\Windows\System32\en-US\cmd.exe.mui
Adding file C:\Windows\Globalization\Sorting\SortDefault.nls
Adding file C:\Windows\System32\kernel32.dll
...
Lets examine the content of the AFF4 image using the aff4imager tool:
$ aff4imager -V image.aff4
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix aff4: <http://aff4.org/Schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<aff4://6567a47c-dd5d-4060-9a39-399fe735d959/C%3A%5CProgram%20Files%20%28x86%29%5CMiranda%20IM%5CPlugins%5CGG.dll>
    aff4:chunk_size 32768 ;
    aff4:chunks_per_segment 1024 ;
    aff4:compression <https://www.ietf.org/rfc/rfc1950.txt> ;
    aff4:original_filename "C:\\Program Files (x86)\\Miranda IM\\Plugins\\GG.dll"^^xsd:string ;
    aff4:size 316416 ;
    aff4:stored <aff4://6567a47c-dd5d-4060-9a39-399fe735d959> ;
    a aff4:image .

...
<aff4://6567a47c-dd5d-4060-9a39-399fe735d959/PhysicalMemory>
    aff4:category <http://aff4.org/Schema#memory/physical> ;
    aff4:stored <aff4://6567a47c-dd5d-4060-9a39-399fe735d959> ;
    a aff4:map .

...
<aff4://6567a47c-dd5d-4060-9a39-399fe735d959/C%3A%5Cpagefile.sys>
    aff4:chunk_size 32768 ;
    aff4:chunks_per_segment 1024 ;
    aff4:compression <https://github.com/google/snappy> ;
    aff4:original_filename "C:\\pagefile.sys"^^xsd:string ;
    aff4:size 1380974592 ;
    aff4:stored <aff4://6567a47c-dd5d-4060-9a39-399fe735d959> ;
    a aff4:image .
We can see an example of a file stream ("C:\\Program Files (x86)\\Miranda IM\\Plugins\\GG.dll"), the physical memory stream, and the pagefile are also acquired.

Conclusions

In this blog post we discussed Rekall’s advanced virtual address transition algorithms. To our knowledge, Rekall is the only open source memory analysis framework to support incorporating the pagefile and mapped files. We also discussed the new aff4acquire plugin which aims to simplify the process of memory acquisition and ensure that more relevant evidence is collected automatically during acquisition time, to complete subsequent analysis.
We have also shared some test images, and examined some cases where address transition is particularly complex.
We hope to convince you, the reader, that properly supporting the pagefile is critical for accurate memory analysis! In Rekall we choose to have a strong and solid foundation on top of which we can develop better memory analysis techniques. In the next blog post we will discuss how this foundation can be used in order to reliably analyze user space heap allocations.