Focus analysis by controlling instrumentation

This guide will go over how to focus analysis on specific code. This is used to ensure the data that Fuzz Introspector handles is relevant to avoid bloating the report with e.g. data about third-party dependencies.

The way both bug-sanitizers e.g. ASAN, code coverage visualization and Fuzz Intropector work involves doing various efforts at compile time. We can focus our analysis by only applying the logic on specific files. For example:

  • Only code that we instrument with coverage visualisation will show up in the code coverage report.

  • Only code that is instrumented with ASAN will have the necessary logic for detecting ASAN-reported bugs.

  • Only code that is compiled with Fuzz Introspector enabled will be included in the Fuzz Introspector report.

Note

How to instrument code is a wide topic, and can influence both performance and bug finding ability of a fuzzer. See e.g. https://blog.envoyproxy.io/a-stroll-down-fuzzer-optimisation-lane-and-why-instrumentation-policies-matter-f0012ec260b3 for a deeper discussion on some of this.

The Fuzz Introspector reports depends on both the code that is analysed by the Fuzz Introspector frontend and also the code coverage instrumentation as the code coverage report is an input to Fuzz Introspector. In this guide, we will not go into details with the internals of this, but rather show the difference instrumentation policies matter on the output.

For this example we will use libarchive. libarchive depends on libxml. libxml is itself a large software package with tens of thousands of lines of code. This example shows how instrumentation policies matter by first doing a Fuzz Introspector with both libarchive and libxml beign instrumented and included in the Fuzz Introspector analysis, and then doing a run with only libarchive compiled with the relevant instrumentation and libxml without it. This example is based on the following PR that was made to perform this exact change in https://github.com/google/oss-fuzz/pull/9007.

Note

OSS-Fuzz sets environment variables when running build.sh, which causes the necessary instrumentation to be applied during build. These environment variables are not set when the Docker image builds, i.e. building code in the Dockerfile in an OSS-Fuzz project means no instrumentation will be applied.

To show the difference between instrumenting/not instrumenting libxml we will first run a Fuzz Introspector run of the OSS-Fuzz set up of libarchive with the build instructions for libxml inside of the build.sh file. This corresponds to the state of the libarchive project as it was before the changes in https://github.com/google/oss-fuzz/pull/9007/files was merged into OSS-Fuzz:

# Clone a clean version of oss-fuzz
git clone https://github.com/google/oss-fuzz
cd oss-fuzz

# At this point you need to:
# Revert the changes in the mentioned PR, so the instructions for building
# libxml is in projects/libarchive/build.sh and not in projects/libarchive/Dockerfile
# In other words, revert https://github.com/google/oss-fuzz/pull/9007
# Do this revert manually, as other things my have changed in the set up.

# Modify the libarchive build.sh to *not* have the lines:
# https://github.com/google/oss-fuzz/blob/65d4864780850058107f25d529710e84d2365acd/projects/libarchive/build.sh#L18-L24

# Generate an introspector report
python3 infra/helper.py introspector libarchive --seconds=10

# Save the introspector report to a path outside `build` so we have it
# for comparison purposes later.
cp -rf build/out/libarchive/introspector-report/ introspector-report-1

At this point we will revert the changes done in libarchive, which we can achieve with git stash. After the git stash, we can clean up the build and do the exact same steps as above. As such, we do the following steps:

# Clean up build
sudo rm -rf ./build

# Undo the changes to the ``libarchive`` set up.
git stash

# Modify the libarchive build.sh to *not* have the lines:
# https://github.com/google/oss-fuzz/blob/65d4864780850058107f25d529710e84d2365acd/projects/libarchive/build.sh#L18-L24

# Generate an introspector report
python3 infra/helper.py introspector libarchive --seconds=10

# Save the introspector report to a path outside `build`.
cp -rf build/out/libarchive/introspector-report/ introspector-report-2

At this stage we have two different introspector reports: introspector-report-1 which holds the project with libxml included in the analysis, and introspector-report-2 which holds the project with libxml excluded from the analysis. We will now start two web servers and so we can observe the differences in the reports.

The differences between the reports are visible. For example, the total number of functions and cyclomatic complexity changes between the two instances of the project. This is shown by the following two figures.

libarchive overview stats with libxml included in the analysis:

libarchive overview with libxml

libarchive overview stats with libxml excluded in the analysis:

libarchive overview without libxml

The difference is also visible elsewhere, for example the project functions overview table. The following figure shows libarchive function overview with libxml included in the analysis. Notice the search box queries for libxml to display the number of libxml functions, totalling to 2607 entries:

libarchive function overview with libxml

libarchive function overview stats without libxml included in analysis. Notice the same query as above is in the search box, but there are no entries that matches the query:

libarchive function overview without libxml