Easy Hypervisor Heap Visualization with PyPANDA and HeapInspect
TLDR
I stumbled across heapinspect: a fantastic heap visualizer project that uses the linux proc filesystem to interact with the system instead of integrating with a larger project like gdb
. That portable design allowed me to "hollow out" the methods that mattered and integrate it with PyPANDA. With our integration we can construct heap visualizations on userland processes inside of the emulated host in PANDA.
Currently the code is set up to work on x86
and x86_64
, but it should be trivial to extend do other architectures.
Take a look at the code or see it in action with asciinema below.
For more information on PyPANDA:
Introduction
A few weeks back I was looking through an example in how2heap when one of their recommended tools caught my eye. It said:
heapinspect: A Python based heap playground with good visualization for educational purposes https://github.com/matrix1001/heapinspect
I have looked at a number of heap visualization tools built into gdb or set up as standalone web-apps so I was surprised to see to see that heapinspect
listed in its features that it was "Free of gdb and other requirement". Because heapinspect
did not require execution in the context of some larger tool (e.g. gdb
) it was absolutely perfect for scooping up and integrating with PyPANDA.
HeapInspect Overview
heapinspect
needs 4 basic information gathering methods to function:
- A method to get virtual memory mappings
- A method to read virtual memory
- Some information about the heap libc implementation
- A method to determine the current architecture
Normally, heapinspect
fulfills these requirements as follows:
- reads virtual memory mappings through
/proc/PID/maps
[1] - reads virtual memory via
/proc/PID/mem
[2] - runs a custom program called
libc_info
to gather information the heap [3] - determines the architecture by reading the ELF header for libc [4]
Replacing with PyPANDA logic
To get heapinspect
working with PyPANDA all we need to do is replace these functions with ones that are compatible with PyPANDA and write a PyPANDA script so to call the heap inspect functions at some interesting point in execution.
Reading virtual memory mappings
PANDA uses the osi_linux plugin to find memory mappings based on kernel structures located by a kernel profile. PyPANDA makes accessing this information very easy to access and use. We replace the old vmmap function with one that makes a call to get_mappings
and add the mappings to an array that we return.
def vmmap(pid, panda = None):
...
for mapping in panda.get_mappings(panda.get_cpu()):
start = mapping.base
end = mapping.base + mapping.size
perm = 'rwx' # not used so we set rwx
# decode C string to Python string
name = panda.ffi.string(mapping.name).decode()
maps.append(Map(start, end, perm, name))
...
return maps
Reading virtual memory
PyPANDA makes reading virtual memory easy. It provides the virtual_memory_read
function that simply asks for the relevant cpu, address, and length. Modeling this particular read was slightly complicated because it's important to the function of the overall program that we return an empty string if the virtual memory is completely unmapped, but fill in zeros in memory that partially exists (has gaps).
def read(self, addr, size):
any_output = False
output = b""
while len(output) < size:
try:
if size > 0x1000:
# read one page max at a time
amount_to_read = 0x1000
else:
# otherwise read the amount left
amount_to_read = size - len(output)
cpu = self.panda.get_cpu()
output += self.panda.virtual_memory_read(cpu, addr, amount_to_read)
any_output = True
except:
# virtual_memory_read throws an error if the address is unmapped
# we fill in zeros, but don't mark that we've received output
output += b"\x00"*0x1000
# we update regardless
addr += 0x1000
# only return result if we actually read something
return output if any_output else ''
Gathering information about the heap implementation
There are a lot of ways of handling this particular issue. There are relatively easy ways to run binaries from your host on the PyPANDA machine. We skipped this problem by just allowing an option to specify the heap implementation arguments so it is assumed that one has run this program on a system ahead of time.
A method to determine architecture
Determining the architecture is trivial for our use case because we are running our system within the context of a specific virtual machine. The Panda
class maintains a reference to the machine type so we can just pass that information along and format it as heapinspect
might want.
Adding to a PyPANDA plugin
Next, we pick an interesting place in our analysis to dissect. I made use of the new library-level symbol hooking functionality to hook libc
:malloc
and do our visualization there.
Most of the code here is from the demo from heapinspect. However, we place it inside our symbol hook we described, modified it to take our arena_info
, which is the heap implementation details we decided to make arguments. Additionally, we have an argument provided for the Panda object itself.
@panda.hook_symbol("libc","malloc")
def hook(cpu, tb, h):
print(f"Caught libc:malloc in {panda.get_process_name(cpu)}")
try:
global pid, args
arena_info = {"main_arena_offset": 4111432,"tcache_enable": True}
hi = HeapInspector(0,panda=panda,arena_info=arena_info)
if args.rela:
hs = HeapShower(hi)
hs.relative = True
if args.x:
print(hs.heap_chunks)
print(hs.fastbins)
print(hs.unsortedbins)
print(hs.smallbins)
print(hs.largebins)
print(hs.tcache_chunks)
elif args.raw:
hs = HeapShower(hi)
if args.x:
print(hs.heap_chunks)
print(hs.fastbins)
print(hs.unsortedbins)
print(hs.smallbins)
print(hs.largebins)
print(hs.tcache_chunks)
else:
pp = PrettyPrinter(hi)
print(pp.all)
except Exception as e:
raise e
h.enabled = False
panda.end_analysis()
Finally, we combine this callback with a machine willing to run it. We use PyPANDA's built-in guest interaction code to boot up a virtual machine, restore to a snapshot, and run some commands.
from pandare import Panda
panda = Panda(generic="i386")
# above section would go here in the script
@panda.queue_blocking
def guest_interaction():
# revert to root snapshot
panda.revert_sync("root")
# make sure the machine does something
panda.run_serial_cmd("ls -la && whoami")
# shut down the machine when we're done when we're done
panda.end_analysis()
panda.run()
Conclusions
PyPANDA enables integration with powerful and interesting tooling. This was a fun tool to develop and the implementation was very fast. The writeup for this blog took far longer than the implementation.