Fuzzing Tinybmp in Rust || From dumb to structure-aware guide
Introduction
In this blog post we will play around with some Rust code and fuzz the BMP header parsing methods within the TinyBMP Rust project. According to the project’s description:
A small BMP parser primarily for embedded, no-std environments but usable anywhere.
This crate is primarily targeted at drawing BMP images to embedded_graphics DrawTargets, but can also be used to parse BMP files for other applications.
While I’ve been trying to learn Rust and understand a bit more about traits, I found this to be a perfect target as usually anything related to parsing might be prone to vulnerabilities. We will be starting off by reading the documentation of the project, setting up a simple (dumb) fuzzer and then move on to more interesting topic such as creating a structure-aware fuzzer. Hence if you haven’t done any fuzzing in Rust and looking for a beginner tutorial then hopefully this blog is for you! We will also be utilising cargo-fuzz
, a cargo subcommand which uses libFuzzer (and needs LLVM sanitizer support). Before we move on, make sure to install it as per project instructions so you can follow along. As such, I will be using Kali 64bit for the rest of this tutorial.
Please note all of the discovered issues/bugs here have been reported already to the project owners and have been fixed!
This blog would also0 have not be possible without Addison’s (@addisoncrump_vr) help, which he provided guidance as well as the harness for the smart/structured-aware section which we will analyse in this blog!
Setting up the project
First things first, in order to be able to run cargo-fuzz you need to install the nightly version of rust (or switch to it):
Let’s start by cloning the repo and reverting it prior the patches that have been added.
Let’s limit the commit entries to 10:
We’re interested in reverting it to the version 0.3.3 so let’s do that:
Creating a dumb fuzzer
Now that we’ve setup the project it’s time to experiment and play around with the docs.
Navigating through the project we can see the following sample code:
If you are coming from a winafl/AFL background naturally you’ll probably think that somehow you’ll need to figure out a way to provide a file input, mutate it and then pass the fuzzed file to the target. However, cargo-fuzz/LLVM works slightly different… remember its API is defined as:
You need to make sure you’re compiling this version (the vulnerable one) and not latest one!
Furthermore looking at the ASAN’s stack trace looks like the fuzzer is panicking and exiting and sure enough there are no refences to tinybmp code.. bummer.
Let’s also see what’s the test case about:
Well that’s a bit sketchy, the test case is empty so something is not quite right here. We need to find another way or even better find another function that does something similar to the example and will allow us to target the same functionality. After spending some time reading the docs/examples and the APIs I came across this very interesting one, the RawBmp:
This struct can be used to access the image data in a BMP file at a lower level than with the Bmp struct. It doesn’t do automatic color conversion and doesn’t apply the color table, if it is present in the BMP file.
It even has a method which creates an image from a byte slice - that looks very promising! Let's update the code again:
And run it one more time:
Success! Literally within 3 seconds of running the fuzzer we get a
thread '' panicked at 'attempt to negate with overflow', /home/kali/Desktop/tinybmp/src/header/dib_header.rs:109:52
</code>
panic issue! Also, notice the following stack traces:
Looks like indeed with our dirty harness dumb.rs line 6 we are hitting the parsing functionality we were aiming for. Let’s quickly verify the crasher:
Fantastic! Looking at the header.rs file:
the fuzzer was able to successfully create a new test case with this signature (notice the BM magic header 2 bytes) and find an issue. I’d also like to mention here that one of my issues was an interesting out of bounds read. For the detailed analysis please check the github issue here.
Excellent! We were literally able with a single line of code to unveil some bugs!
Coverage
Remember that it’s very essential to check coverage, so let’s do that. Before proceeding make sure to install the llvm-profdata for the rust toolchain. Let’s run the coverage command:
cargo fuzz coverage dumb
Ok, coverage data has been saved, let’s try to convert and view it:
Looking at the raw_bmp.rs reveals that lines 73-104 got never hit. Within the RawBmp trait implemention we can see that ParseError::InvalidImageDimensions got never hit, including all those function in the above image.
Patches verification and 2nd round of fuzzing
Let’s revert it back to the patched state:
and re-run the fuzzing campaign for five minutes..
cargo fuzz run dumb -- -max_total_time=300
Hmm! As you can see from the above image looks like the project mainteners have done a great job - they’ve added lots of verification and improved header parsing so that dumb fuzzing won’t find any low-hanging fruits..
Time for us to skill up and move to smart fuzzing!
Structured Aware Fuzzing
Now it’s time to invest some time a bit more and get a better understanding of the parsing mechanism. We will be using the harness provided here.
Create a new project and paste the harness from the above link:
cargo fuzz add structured
Before starting make sure you add those extra depedencies to your main Cargo.toml:
Let’s try to break down and understand what this harness does.
Lines 4-16 define our modules. The most interesting one that we will be using is the arbitrary one which as per documentation:
This crate is primarily intended to be combined with a fuzzer like libFuzzer and cargo-fuzz or AFL, and to help you turn the raw, untyped byte buffers that they produce into well-typed, valid, structured values. This allows you to combine structure-aware test case generation with coverage-guided, mutation-based fuzzers.
We will be also importing a few other crates such as the Point and rand::rngs::StdRng because we need them for the harness.
Line 18 #[allow(non_camel_case_types)] disables the camel case warnings.
Lines 21-26 create a new enum type DibType that is required so we can initialise the header size. Notice also how on line 19 we are automatically implementing the #[derive(Debug, Copy, Clone, PartialOrd, PartialEq)] traits for the DibType structure. If we don’t do that, the harness won’t compile:
Moving on:
Simillary to the previous struct we again implement the required traits and initialise the Rgb and Bitfields values. The following lines define a more interesting struct, a FuzzyBmp one.
If you’ve previously played with Rust you will immediately recognise the u32 and i32 which stands for unsigned and signed integers. In addition to those, we are using the Box and <u8> types so what are they?
It should be noted that some of these values are taken from the dib_header.rs code which will be used for creating a smart-ish BMP file:
There are a few variables that need our attention here. libFuzzer supports only the primitive variables (signed/unsigned integers such i32/u32) as well as chars. However in this struct we have defined some custom ones such as the DibType and Bpp. Later in this section we will see how we will implement the arbitrary trait in order for libfuzzer to understand these custom variables.
The From trait allows for a type to define how to create itself from another type, hence providing a very simple mechanism for converting between several types. There are numerous implementations of this trait within the standard library for conversion of primitive and common types.
In short, on lines 52-53 we are implementing the Vec<u8> vector. Then on line 54, we declare a new mutable vector, and we slowly start filling the values for the BMP file format. Since this is a Vec<u8> we will be using Vec::extend_from_slice to append to the vector.
Then on 59-69 we start crafting the header. Before moving on with the DIB header let’s panic on purpose the fuzzer and print so far the contents so we can verify we are on the right track:
and running the fuzzer yeilds the following:
So far, so good. We’ve managed to correctly populate the right values for the image header. Let’s continue with the DIB header:
Now let’s move on to arbitrary trait implementation.
Here we are using arbitrary’s Unstructured data, which as per documentation:
An Unstructured helps Arbitrary implementations interpret raw data (typically provided by a fuzzer) as a “DNA string” that describes how to construct the Arbitrary type. The goal is that a small change to the “DNA string” (the raw data wrapped by an Unstructured) results in a small change to the generated Arbitrary instance. This helps a fuzzer efficiently explore the Arbitrary’s input space.
Unstructured is deterministic: given the same raw data, the same series of API calls will return the same results (modulo system resource constraints, like running out of memory). However, Unstructured does not guarantee anything beyond that: it makes not guarantee that it will yield bytes from the underlying data in any particular order.
You shouldn’t generally need to use an Unstructured unless you are writing a custom Arbitrary implementation by hand, instead of deriving it. Mostly, you should just be passing it through to nested Arbitrary::arbitrary calls.
We start off with the DibType where one of the DIB_INFO_HEADER_SIZE, DIB_V3_HEADER_SIZE, DIB_V4_HEADER_SIZE, DIB_V5_HEADER_SIZE values are randomly selected. Then the same smart values are generated for the bpp structure.The compress returns just two values: either Rgb or Bitfields. On lines 164-173 we generate more smartish values that make sense for the parsing. Then on lines 185-197 we generate the data_len which was previously hardcoded. Continuing, on line 195 a new random generator is declared where it will be used to fill random data for the image_data variable.
Lines 200-208 will create a vector filled with random colour table values. We are using an iterator to chain the take() method and fill it only with colour_table_num_entries * 4
I’ve added a few print methods and here are a few sample examples of the generated data:
Would yield:
Last bits of the arbitrary implementation we’ve got his Ok result since we need to return a result according to the function signature. Finally we implement the size_hint() function which Returns the bounds on the remaining length of the iterator.
If we run the fuzzer a couple of times we can see that indeed the smart values are properly generated. First run:
Second run:
We’re getting close finishing the harness. Let’s take a look at the last bits:
On line 231 we create the new bmp Vector (derived from the FuzzyBmp structure) and we use again the familiar RawBmp::from_slice() method (which we used in our dumb fuzzer) but this time we also provide the smart bmp structure. Also notice how looking at the raw_bmp.rs source code the following snippet shows that the pixel() function expects a Point structure as parameter and that’s what we are doing on lines 234-240.
At this stage let’s print the contents of the FuzzyBmp vector again:
This looks more complete, we’ve now calculated dynamically the file_size, as well as the image_data_len.
The above code snippet is the bit where cargo-fuzz uses to start mutating data, it calls the previously defined do_fuzz() function.
Finally, these last bits will not be used within cargo-fuzz, if we compile the harness and run it we can see that expects a parameter as a dictionary and reads the contents of the (assuming we provided bmp) files. If we print the contents we can see the following:
We are done with our analysis, let’s try to kick in the fuzzer now:
Unfortunately this improved harness didn’t yield any new bugs!
Coverage round 2
Let’s do this one more time running the smart-ish fuzzer:
cargo fuzz coverage structured
and after converting the data to HTML we get:
Fantastic! We did a lot of effort but as you can see this time we were able to hit all those functions (1.7k and 3.24 million times!) and get decent coverage.
Conclusion
We started with finding a fun target, created a dumb fuzzer and found some bugs with it. Then, we moved on with a smart-ish/structured aware approach and despite the fact were not able to uncover new bugs, we learnt how to mess around with arbitrary trait, and we dug a bit deeper to the internals of the project. Hope you enjoyed it and learnt something - I definitely did!