Welcome back to another fuzzing blog post. This time let’s talk about grammar based fuzzing! I will be writing about how I tried to fuzz a few PDF software such as Foxit and Adobe.
In order to do that, I used the following tools:
domato, grab it from its repo while it’s fresh!
Debenu Quick PDF Library, for my campaign the current version as of writing this is 17.11 but YMMV, please note that you need to register in order to request a trial.
BugId to help us triage any crashes/save crashers.
Your favourite PDF parser/software!
Grammar Based Fuzzing
From the wiki: A smart (model-based, grammar-based,or protocol-based fuzzer leverages the input model to generate a greater proportion of valid inputs. For instance, if the input can be modelled as an abstract syntax tree, then a smart mutation-based fuzzer would employ random transformations to move complete subtrees from one node to another. If the input can be modelled by a formal grammar, a smart generation-based fuzzer would instantiate the production rules to generate inputs that are valid with respect to the grammar. However, generally the input model must be explicitly provided, which is difficult to do when the model is proprietary, unknown, or very complex.
In short, grammar based is aware of input structure, and instead of dumb fuzzing where we simply mutate bytes without having any knowledge of the target/file/network protocol specification we do have knowledge of the structure (such as the API presented here) and we will be generating test cases based on that specification.
There are many tutorials out there, but I recommend having a look at domato’s page, where you can fully understand how it works. As mentioned earlier, we will be creating a grammar so the function
int DPLDrawHTMLText(int InstanceID, double Left, double Top, double Width, wchar_t * HTMLText)
can be called with bogus; yet valid input such as the following:
Getting started with Debenu Quick PDF Library
Once you obtain your trial and install it, you need to register the ActiveX DLL.
This can be done by either running
%systemroot%\System32\regsvr32.exe targeting the 64-bit version of the DLL
%systemroot%\SysWoW64\regsvr32.exe to register the 32-bit version (DebenuPDFLibraryAX1711.dll)
While you are there make sure to note down the TRIAL_LICENSE_KEY.TXT as you’ll need it later for generating the files.
Exploring the library and reading the documenation we can see that the library offers a variety of bindings: From C#, C++, Delphi, Objective-C to Perl, PHP, VB6, VBScript and Visual Basic (.NET). If you want to experiment, go ahead and check this page! The library moreover, provides many function groups that can be targeted:
For my case, I ended up using the Visual Basic and Perl bindings. Once you create a grammar it’s very easy to modify the template and use another language, and that’s they beauty of grammar based fuzzing!
Let’s use this following Visual Basic example:
Executing it with the 32-bit version of the DLL yields the following output:
Opening it with Foxit we can confirm that our file has been generated!
Success! Within few minutes, we managed to set up the library, get some sample code and generate a valid PDF. Let’s move on!
Creating the grammar
To demonstrate domato’s capabilities, let’s target the following sample function:
As you can see, this function expects four parameters:
double Left, double Top, double Width,
wchar_t * HTMLText)
As such, the SDK expects the following call:
DrawHTML(200.0, 400.0, 800.0,"my text")
Forming the above function call with domato and creating a grammar is straightforward, we simply need to define a symbol and assign its corresponding value.
The value can be something like
MIN_INT interesting values, common values that they may lead to common signed/unsigned integer overflows/underflows or undefined behaviour.
Continuing, since we will be generating programming language code we have to include the
!begin lines and
!end lines keywords:
Following the API specification and creating the
HTMLText method can be formed within literally a few lines:
Creating the template.pl
Once you have the basic grammar, how are we going to call these functions within our binding? In fact, looking at previous github code, we simply need to provide the sample code we were given with slightly modifications as seen below:
From the screenshot above, you can see that the code within the <DPLFuzz> will get substituted with the
$QP-><HTMLText> generated cases! Here’s a sample of how it looks like once domato has done its magic:
Now our next step is to create a file where it actually generates this grammar (called a generator). This can be achieved by using the already existing ones, such as Ivan’s generator.py, with a few modifications:
Saving the actual test cases
Before we continue, notice how on the provided sample code (hello-world.vbs) this line was responsible for saving the file name:
FileName = "hello-world.pdf". This one is hardcoded and certainly does not suit us.
In order to solve this issue, I’ve coded something very simple, a python script which finds the “placeholder” which
is the hardcoded value XXX, and replaces it with
BugId and you!
If you haven’t read already the Fuzz in sixty seconds article blog, please spend some time and see how BugId can be integrated into your fuzzing workflow. The idea is very similar, but instead of fuzzing browsers, we are looping through the generated cases one by one; I have modified some parts to reflect those changes as seen below:
Essentially, here we are executing Domato’s generator, replacing the XXX marker with the actual filename, executing the perl generated cases from domato, and finally saving the generated PDFs to our test folder.
With the above modifications, once the BAT file is executed, it gives us the following screenshot:
Putting it all together
With all these steps combined, let’s run the cmd file, and see how this goes:
Et voila! By using open source tools, and with some effort we are now able to fuzz not only Foxit software, but pretty much any PDF parser out there!
Surprisingly, although I put in a lot of effort from creating the grammar to modifying BugId, unfortunately the only crashes I managed to get were some meaningless NULL pointer dereferences. You’d expect that such software has been fuzzed to death, however as j00ru once said according to the bug hunter’s law… there is always one more bug :)
Interestingly, I initially used the Visual Basic bindings, however once a very large integer was passed to these methods, Visual Basic would complain and fail to generate the case as seen below:
Please note how it also informs the user in case the parameters or the assignments are wrong. That’s very handy and can be used to your advantage!
In this blog post we’ve covered a very brief introduction to grammar based fuzzing. We have used the Quick PDF library where we could apply this knowledge and have demonstrated how we can create a grammar from scratch. We have also fuzzed a sample function within the API generating structure aware test cases. Finally, we’ve used BugId to iterate over our cases in case any crashes were found. The sky is the limit, this type of fuzzing can be used not only for this specific library, but for every file format which is text based or even programming languages!
I hope you enjoyed as much as I did! As always, any ideas, comments, feedback is welcome!