Focus analysis by controlling instrumentation
This guide will go over how to focus analysis on specific code. This is used to ensure the data that Fuzz Introspector handles is relevant to avoid bloating the report with e.g. data about third-party dependencies.
The way both bug-sanitizers e.g. ASAN, code coverage visualization and Fuzz Intropector work involves doing various efforts at compile time. We can focus our analysis by only applying the logic on specific files. For example:
Only code that we instrument with coverage visualisation will show up in the code coverage report.
Only code that is instrumented with ASAN will have the necessary logic for detecting ASAN-reported bugs.
Only code that is compiled with Fuzz Introspector enabled will be included in the Fuzz Introspector report.
Note
How to instrument code is a wide topic, and can influence both performance and bug finding ability of a fuzzer. See e.g. https://blog.envoyproxy.io/a-stroll-down-fuzzer-optimisation-lane-and-why-instrumentation-policies-matter-f0012ec260b3 for a deeper discussion on some of this.
The Fuzz Introspector reports depends on both the code that is analysed by the Fuzz Introspector frontend and also the code coverage instrumentation as the code coverage report is an input to Fuzz Introspector. In this guide, we will not go into details with the internals of this, but rather show the difference instrumentation policies matter on the output.
For this example we will use libarchive
. libarchive
depends on libxml
.
libxml
is itself a large software package with tens of thousands of lines of
code. This example shows how instrumentation policies matter by first doing
a Fuzz Introspector with both libarchive
and libxml
beign instrumented
and included in the Fuzz Introspector analysis, and then doing a run with only
libarchive
compiled with the relevant instrumentation and libxml
without
it. This example is based on the following PR that was made to perform this
exact change in https://github.com/google/oss-fuzz/pull/9007.
Note
OSS-Fuzz sets environment variables when running build.sh
, which causes
the necessary instrumentation to be applied during build. These environment
variables are not set when the Docker image builds, i.e. building code in
the Dockerfile
in an OSS-Fuzz project means no instrumentation will be
applied.
To show the difference between instrumenting/not instrumenting libxml
we
will first run a Fuzz Introspector run of the OSS-Fuzz set up of libarchive
with the build instructions for libxml
inside of the build.sh
file. This
corresponds to the state of the libarchive
project as it was before
the changes in https://github.com/google/oss-fuzz/pull/9007/files was merged
into OSS-Fuzz:
# Clone a clean version of oss-fuzz
git clone https://github.com/google/oss-fuzz
cd oss-fuzz
# At this point you need to:
# Revert the changes in the mentioned PR, so the instructions for building
# libxml is in projects/libarchive/build.sh and not in projects/libarchive/Dockerfile
# In other words, revert https://github.com/google/oss-fuzz/pull/9007
# Do this revert manually, as other things my have changed in the set up.
# Modify the libarchive build.sh to *not* have the lines:
# https://github.com/google/oss-fuzz/blob/65d4864780850058107f25d529710e84d2365acd/projects/libarchive/build.sh#L18-L24
# Generate an introspector report
python3 infra/helper.py introspector libarchive --seconds=10
# Save the introspector report to a path outside `build` so we have it
# for comparison purposes later.
cp -rf build/out/libarchive/introspector-report/ introspector-report-1
At this point we will revert the changes done in libarchive, which we can
achieve with git stash
. After the git stash, we can clean up the build
and do the exact same steps as above. As such, we do the following steps:
# Clean up build
sudo rm -rf ./build
# Undo the changes to the ``libarchive`` set up.
git stash
# Modify the libarchive build.sh to *not* have the lines:
# https://github.com/google/oss-fuzz/blob/65d4864780850058107f25d529710e84d2365acd/projects/libarchive/build.sh#L18-L24
# Generate an introspector report
python3 infra/helper.py introspector libarchive --seconds=10
# Save the introspector report to a path outside `build`.
cp -rf build/out/libarchive/introspector-report/ introspector-report-2
At this stage we have two different introspector reports: introspector-report-1
which holds the project with libxml
included in the analysis, and introspector-report-2
which holds the project with libxml
excluded from the analysis. We will
now start two web servers and so we can observe the differences in the reports.
The differences between the reports are visible. For example, the total number of functions and cyclomatic complexity changes between the two instances of the project. This is shown by the following two figures.
libarchive
overview stats with libxml
included in the analysis:
libarchive
overview stats with libxml
excluded in the analysis:
The difference is also visible elsewhere, for example the project functions
overview table.
The following figure shows libarchive
function overview with libxml
included in the analysis. Notice the search box queries for libxml
to
display the number of libxml
functions, totalling to 2607 entries:
libarchive
function overview stats without libxml
included in analysis.
Notice the same query as above is in the search box, but there are no entries
that matches the query: