From Volatility to Graylog

Having the ability to import Volatility data into a SIEM can give you an easy way to filter and pivot around data as well as provide additional functionality to automate some of the threat analysis. Additionally, staring at Volatility output in a terminal isn't as fun as it should be, but trying to figure out how to import that data into a SIEM can be quite a daunting task as well, especially if you are not used to it. Built within Volatility is a unified-output plugin which will write a Volatility plugin's output into a specified file type of your choice. One of the output options for this plugin is to create the data into a JSON file, which we all know can be easily imported into most SIEMs. Here is an example of the unified-output module: -f memory.001 --profile=Win7SP1x86 netscan --output=json --output-file=/tmp/json/netscan.json

The problem though is that if you are importing this data to Graylog or any SIEM for that matter, there is some critical information that is missing, especially when doing analysis on multiple machines, as well as some formatting issues. If you are just doing one off analysis on specific machines, the data really isn't missing as this data is normally documented in the file name or elsewhere and providing this data in the output file could be quite useless outside of a SIEM. But since we are sending this data to a SIEM there are three issues or better yet "features" that are missing from the created JSON file which are:

  1. Host name or IP address.
  2. Plugin used to create JSON file.
  3. Due to how the JSON file is created, if the file is shipped directly to Graylog, Graylog will store the data as a single document, which isn't an issue as extractors or a pipeline  can be created to extract that information, but why do additional work if you do not have to?

I couldn't find a utility that solved all three of these issues and felt that this was out of the scope of extending the unified-output module so I ended up writing a standalone python utility that solves this problem. The name of it is vol2log and it essentially adds the plugin module and the host name or IP address of the memory dump you ran Volatility against, which you specify from the command line, and will post this data to Graylog in a way that does not require additional processing for the extraction of the fields. 

Here is a partial example of a posted JSON file before passing the file through vol2log without any additional pipelines or extractors populating the fields:




As I mentioned earlier, the data appears essentially as a single document in our Graylog instance, which doesn't give us the ability to manipulate or search through our data. We could add additional extractors or pipelines to our input to populate those fields, the issue though is we would have to perform this task for almost all of the plugins as there are different fields for different plugins. This reason alone makes vol2log an easy choice when sending in data as it can handle the fields dynamically as we are able to advantage of the HTTP Gelf input to populate those fields. In addition, we do not need to edit any configuration files to ship the data to our Graylog instance, we simply just pass a few arguments to vol2log, and the data has been posted. Here is an example of our posted data with vol2log:

 Here is another example using vol2log with an output from the netscan plugin:   

Here is another example using vol2log with an output from the netscan plugin:


Here is another example using vol2log with an output from the netscan plugin:


As you can see, the formatting is much easier to read, and queries can be made to easily pivot throughout the data. Here is an example of vol2log's usage:

python -host -port 12201 -jsonFile "C:\Python\Data\Volatility JSON Files\netscan.json" -plugin netscan -volHost infectedhost

The switches listed above are all configured to be required, but that is easily changeable. I like making my code easy to understand and it's not that complicated of a script so it is easily changeable if this does not fit your needs.

Just to cover all of our bases, the other critical part of transferring the data to our SIEM is to make sure you have configured Graylog to accept our JSON post. The reason I enjoy working with Graylog so much is because of how simple it is to create a listener. You simply need to select the GELF HTTP input, select the port to listen on, and ensure the bind address is either set to it's default option of or select the host's IP address that can be accessed from wherever you are importing your data from to your Graylog instance and start the input. Here is an example of the configuration:

Just to summarize how easy this whole process is, here is a quick step-by-step without all the images and explanations:

  1. Create the GELF HTTP input on Graylog and configure the port to listen on. (Ensure the port can be accessed from the remote machine that will be sending the data. Check your iptables/firewalld configs.)
  2. Run your Volatility commands using the unified-output plugins with the naming convention of your choice, choosing the JSON file format.
  3. Run the utility specifying the the following switches:
    • -host <Graylog instance IP>
    • -port <Graylog input port>
    • -plugin <Volatility plugin's name>
    • -volHost <IP address of the memory file that the Volatility plugin was ran against>
    • -jsonFile <Location of the JSON file>

The vol2log utility still needs a lot of development as it has only been used for uploading single files at a time, which it does quite efficiently, but I have a lot more features that I would like to implement to automate more of this process, I just need to find some time to finish this utility. Hopefully others will find this utility of use as well. My intention is to follow this post up with how to automate some of the threat analysis of this data as well as share some of my Graylog pipelines for this use case.

I also would greatly appreciate any input for optimizing the script as well as any testing of the utility. Thanks in advance!