PANDA Web server live process graph
Introduction
In this post I am going to demonstrate the power of PyPANDA by starting up an x86_64
virtual machine, tracking its tasks through Operating System Introspection (OSI), and serving that information in a Flask powered web-server that shows processes in graphviz
and automatically updates the graph over asyncio
. Both the web sever and the virtual machine are run in both the same process and python script.
Take a look at the full source code here. Grab the docker image here.
Screen recording/demo
Background
I worked on a Python extension to PANDA (pypanda) as my Senior Design Project at Rose-Hulman Institute of Technology. The project has come a long and I am very proud of it, but it would not be where it is today without the work my fantastic colleagues have put into it. For those unfamiliar with PANDA this is the description on github:
PANDA is an open-source Platform for Architecture-Neutral Dynamic Analysis. It is built upon the QEMU whole system emulator, and so analyses have access to all code executing in the guest and all data.
PyPANDA gives you the capability to register callbacks from PANDA, interact with the serial console, and issue commands to the monitor all from Python.
PyPANDA process list capture
First off we need to set up PANDA from python.
from panda import Panda
# Panda object sets up virtual machine options
#-nographic is analagous to -display none (text console)
panda = Panda("x86_64", mem="1G",expect_prompt=rb"root@ubuntu:.*",
qcow="bionic-server-cloudimg-amd64-noaslr-nokaslr.qcow2",
extra_args=["-nographic"])
# set our OS name for OSI
panda.set_os_name("linux-64-ubuntu:4.15.0-72-generic-noaslr-nokaslr")
# start running our machine
panda.run()
If you'd like a qcow to start off with you can get the one I am using here. We host some qcows at panda-re.mit.edu/qcows/.
This is all you need to initialize a virtual machine, but for our analysis we'd like our process list do something so let's start up some processes. Our run_commands
function here runs asynchronously with the main thread. It controls both the QEMU monitor and serial console. It first reverts to a snapshot called "root". Then it starts processes in a loop by typing a command into the serial console, pressing enter, and waiting for and printing the output.
from panda import blocking
@blocking
def run_commands():
panda.revert_sync("root")
print(panda.run_serial_cmd("uname -a"))
print(panda.run_serial_cmd("ls -la"))
print(panda.run_serial_cmd("whoami"))
print(panda.run_serial_cmd("date"))
print(panda.run_serial_cmd("uname -a | tee /asdf"))
# queue our blocking function
panda.queue_async(run_commands)
Next, we want to find a way to capture our process list. A naive approach would be to use the PANDA syscalls2 plugin to listen for all sys_execve
and sys_exit
syscalls. It gets slightly tricky because sys_exit doesn't tell us what program we are. We can map processes by their ASID (Application Specific ID). We can do this by adding the following code to our example above:
# our list of processes
processes = set()
cr3_mapping = {}
# called every time our system calls sys_execve
@panda.ppp("syscalls2","on_sys_execve_enter")
def on_sys_execve_enter(cpu, pc, pathname, argv, envp):
# read the program name from guest memory
program_name = panda.read_str(cpu, pathname)
# add to our process list processes.add(program_name)
# record our cr3 so we know which process to
# delete when sys_exit is called
cr3 = cpu.env_ptr.cr[3] cr3_mapping[cr3] = program_name
print(f"Got new program called {program_name}")
# called when sys_exit is called
@panda.ppp("syscalls2", "on_sys_exit_enter")
def on_sys_exit_enter(cpu, pc, exit_code):
# fetch our cr3 to identify the process
cr3 = cpu.env_ptr.cr[3]
process = cr3_mapping[cr3]
# remove our process from our list
processes.remove(process)
print(f"Removed {process} from process list")
Would that it were so simple! Checking sys_enter
and sys_exit
misses many events that we would want reflected in a process list. It misses events such as process forks and process changes names. We could add to our model to add sys_fork
, sys_vfork
, and sys_prctl
(for name changes) and process all of those, but it begins to get tedious. Most importantly, this approach prevents us from starting at a snapshot and knowing what processes have been started.
To make it easier for us to gather process lists PANDA has Operating System Introspection (OSI) utilities that we can make use of that let us iterate the list of task_struct
in the kernel directly. PANDA analysis is done via callbacks. We need to pick a reasonable callback to do regular analysis. We choose asid_changed
which gets called when the ASID changes on our underlying system.
processes = set()
@panda.cb_asid_changed
def asid_changed(env, old_asid, new_asid):
global processes
processes_new = set()
# get our processes from OSI
for process in panda.get_processes(env):
# use FFI to read our string from memory
process_name = ffi.string(process.name)
# add it to our set
processes_new.add(process_name)
print(f"{process_name}_{process.pid}")
processes = process_new
return 0
This is the most basic version of our plugin. It receives a list of processes from the osi
plugin, parses it, and replaces the old list. This would work alright if you wanted to print out a process list, but it's not very interesting. It doesn't construct the process tree.
We first construct a Python class to handle our processes. It largely encapsulates a Tree structure, but also has methods to make things easier on us. In particular, our equality class is a bit different than you might expect. This is because we would like to uniquely identify tasks by the create_time
member of the OSIProc
struct. This reflects the start_time
field in task_struct
in the linux kernel. It is a great method for uniquely identifying tasks. Nearly every other field available does not actually uniquely identify a process. However, our process tree includes kernel tasks, which may have the same start_time
field. For this reason we consider the case differently if an object is a child of kthreadd
, the kernel task daemon.
class Process(object):
def __init__(self, proc_object):
self.pid = proc_object.pid
self.ppid = proc_object.ppid
self.start_time = proc_object.create_time
self.name = ffi.string(proc_object.name).decode()
self.children = set()
self.parent = None
@property
def depth(self):
if self.parent is self or not self.parent:
return 1
return 1 + self.parent.depth
def add_child(self, other):
# avoid self loops
if not other is self:
self.children.add(other)
def is_kernel_task(self):
if "kthreadd" in self.name:
return True
if self.parent and not self.parent is self:
return self.parent.is_kernel_task()
return False
def __eq__(self, other):
if not isinstance(other, Process):
return False
# start_times can collide with kernel tasks
if not self.is_kernel_task():
return self.start_time == other.start_time
return self.pid == other.pid
def __str__(self):
# replace ":" with "" because graphviz messes ":" up
return f"{self.name}_{hex(self.start_time)[2:]}".replace(":","")
Our asid_changed
function, which regularly polls on processes must construct the tree map and maintain a list of processes started and exited so we can update old maps. This will be the structure of the asyncio events. Our asyncio events will be specific to each connection made to us and they will receive regular updates from these changes in the Tree.
@panda.cb_asid_changed
def asid_changed(env, old_asid, new_asid):
global processes
# get all unique processes
new_processes = set()
# make a mapping from PID -> process
pid_mapping = {}
for process in panda.get_processes(env):
proc_obj = Process(process)
new_processes.add(proc_obj)
pid_mapping[proc_obj.pid] = proc_obj
# iterate over our processes again, from low to high PID,
# and add the parent <-> child relation
processes_to_consider = list(new_processes)
processes_to_consider.sort(key=lambda x: x.pid)
for process in processes_to_consider:
parent = pid_mapping[process.ppid]
process.parent = parent
parent.add_child(process)
# convert back to a set
proc_new = set(processes_to_consider)
# python lets us do set difference with subtraction
# these are the changes in the process
new_processes = proc_new - processes
dead_processes = processes - proc_new
# set the new process mapping
processes = proc_new
# add started processes for each connection
for nodes in nodes_to_add:
for node in new_processes:
nodes_to_add[nodes].add(node)
# add exited processes for each connection
for nodes in nodes_to_remove:
for node in dead_processes:
nodes_to_remove[nodes].add(node)
return 0
We have constructed a Tree structure of processes and we create a per-connection stream of events for process create and exit. Now we need to construct the webserver.
Flask webserver with asyncio
Our webserver is based on Flask with socketio. We spin up flask in its own thread so that we can run pypanda in the main thread. We must start up a flask socketio server with debug=False
and use_reloader=False
. This is important because Flask
wants to be run in the primary thread to use advanced features like reloading and running it a secondary thread like we have it requires we run it with these options off.
from flask import Flask, request, render_template
from flask_socketio import SocketIO
def start_flask():
socketio.run(app,host='0.0.0.0',port=8888,
debug=False, use_reloader=False)
x = threading.Thread(target=start_flask)
x.start()
Next, we define a landing page that will display the process graph to us with JavaScript. Here we use graphviz
to create the structure and then output the source it generates into our html directly.
from graphviz import Graph
@app.route("/")
def graph():
g = Graph('unix', filename='process',engine='dot')
def traverse_internal(node):
if node is None:
return
for child in node.children:
g.edge(str(node),str(child))
traverse_internal(child)
init = get_pid_object(0)
traverse_internal(init)
return render_template('svgtest.html', chart_output=g.source)
Our svgtest.html
template that we render is pretty simple. It loads the vis-network
javascript plugin and other dependencies. It then loads our page as a DOTstring
by using the render syntax {{chart_output|safe}}
from the template and renders it into the HTML div with id="graph"
.
<head>
<script type="text/javascript" src="https://unpkg.com/vis-network/standalone/umd/vis-network.min.js"></script>
<script src="//code.jquery.com/jquery-3.3.1.min.js"></script>
<script type="text/javascript" src="//cdnjs.cloudflare.com/ajax/libs/socket.io/1.3.6/socket.io.min.js"></script>
<link rel="stylesheet" href="//maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css">
</head>
<div id="graph"></div>
<script type="text/javascript" charset="utf-8">
$(document).ready(function(){
// provide data in the DOT language
var DOTstring = `{{chart_output|safe}}`;
var parsedData = vis.parseDOTNetwork(DOTstring);
nodes = new vis.DataSet(parsedData.nodes);
edges = new vis.DataSet(parsedData.edges);
var data = {nodes: nodes,edges: edges};
var container = document.getElementById('graph');
options = {layout:{improvedLayout:false}};
// create a network
var network = new vis.Network(container, data, options);
});
</script>
This is a nice static page, but we can make it dynamic by adding asyncio
events. First, we need to implement the connect
and disconnect
events which are automatically generated by the asyncio
library.
thread = Thread()
thread_stop_event = Event()
@socketio.on('connect', namespace='/test')
def test_connect():
# need visibility of the global thread object
global thread
print('Client connected')
if not thread.isAlive():
print("Starting Thread")
thread = socketio.start_background_task(emitEvents)
@socketio.on('disconnect', namespace='/test')
def test_disconnect():
print('Client disconnected')
We want to provide regular updates on the virtual machine to the web browser. To handle the updates we start up a background task thread whenever we receive a connection which take the event stream our process list generates and sends it along to the website. This thread is distinguished by a unique random string we generate on each run. We send along updates in a loop and remove processes if they appear in both the to_add
and to_remove
queue. We sort our to_add
and to_remove
queues by depth. For to_add
we prioritize the minimum depth. For to_remove
we prioritize the maximum depth. Doing so avoids situations where a child process has no apparent connected parent process in the graph.
Our socketio
library emits an event called newprocess
which has members child
(for the child process) and pproc
for the parent process as well as whether it should be added or removed from the list.
from flask_socketio import emit
from string import ascii_lowercase
from random import choice
thread = Thread()
thread_stop_event = Event()
def get_random_string(length):
letters = string.ascii_lowercase
return ''.join(random.choice(letters) for i in range(length))
def emitEvents():
global nodes_to_add
my_string = get_random_string(8)
nodes_to_add[my_string] = set()
nodes_to_remove[my_string] = set()
my_nodes_to_add = nodes_to_add[my_string]
my_nodes_to_remove = nodes_to_remove[my_string]
while not thread_stop_event.isSet():
# find our intersection of events. remove them
snta = set(my_nodes_to_add)
sntr = set(my_nodes_to_remove)
common = snta.intersection(sntr)
for i in list(common):
my_nodes_to_add.remove(i)
my_nodes_to_remove.remove(i)
# sort nodes to add by depth
nodes_to_add_sorted = list(my_nodes_to_add)
sorted(nodes_to_add_sorted, key=lambda x: x.depth)
if my_nodes_to_add:
p = nodes_to_add_sorted[0]
my_nodes_to_add.remove(p)
print(f"emitting newprocess {p}")
parent = p.parent
socketio.emit('newprocess',
{'operation': 'add',
'pproc': str(parent),
'child': str(p)},
namespace='/test')
nodes_to_remove_sorted = list(my_nodes_to_remove)
sorted(nodes_to_remove_sorted, key=lambda x: -x.depth)
if my_nodes_to_remove:
p = nodes_to_remove_sorted[0]
my_nodes_to_remove.remove(p)
print(f"emitting remove {p}")
parent = p.parent
socketio.emit('newprocess',
{'operation': 'remove',
'pproc': str(parent),
'child': str(p)},
namespace='/test')
socketio.sleep(0.1)
Lastly, we need JavaScript code to add and remove nodes, which we insert into our <script>
tags from above. Our code interacts with the nodes
and edges
variables from above to attempt to remove or add nodes and edges from above.
var socket = io.connect('http://' + document.domain
+ ':' + location.port + '/test');
var numbers_received = [];
//receive details from server
socket.on('newprocess', function(msg) {
console.log( msg );
if (typeof msg.pproc !== 'undefined' &&
typeof msg.child !== 'undefined'){
pproc = msg.pproc;
child = msg.child;
if (msg.operation == "add"){
console.log("Adding pair of type "+
msg.pproc +" " +msg.child);
try{
nodes.add({id:child,label:child});
edges.add({to:child, from:pproc});
}catch (err){
console.log("fail add");
}
}else{
console.log("Removing pair of type "+
msg.pproc +" " +msg.child);
try{
nodes.remove({id:child,label:child});
edges.remove({to:child, from:pproc});
} catch (err){
console.log("fail remove");
}
}
}else{
console.log("pproc or child is undefined");
}
Conclusions
The Python extension for PANDA is a powerful extension that allows for really interesting and fun analysis. In our example here we had a Web server with asyncio
functionality and a script driving a virtual machine all in a single python script. I am really excited to see what else can be produced with this!
Take a look at the source code here. Grab the docker image here.