Compare introspector reports

Fuzz Introspector comes with a diff command to differentiate two Fuzz Introspector runs. This is achieved by way of the summary.json file which Fuzz Introspector produces, and holds a lot of the data Fuzz Introspector generates in json format.

The diff command simply takes two summary.json files as arguments, and will highlight important differences, e.g. coverage and reachability, between the reports.

This user guide will show how to use the diff command to compare two fuzzing reports. This is used when assessing whether any regression or improvements has happened, for example while developing extensions to a given fuzzing suite.

The current diffing supported by Fuzz Introspector is not integrated into OSS-Fuzz yet. As such, to use this feature you need to use a local recent version of Fuzz Introspector. For the sake of completeness we will assume the set up achieved using the following commands:

# Get a local Fuzz Introspector code, and install Python dependencies.
git clone https://github.com/ossf/fuzz-introspector
cd fuzz-introspector
git submodule init
git submodule update

python3 -m virtualenv .venv
. .venv/bin/activate
pip3 install -r ./requirements.txt

# Clone OSS-Fuzz
cd oss_fuzz_integration
./build_post_processing.sh

cd oss-fuzz

The following assume you’re in the oss-fuzz directory as at the end of the commands above.

The next step is to generate two Fuzz Introspector reports, where there will be a difference in the results in the report. There are two different results that we’re often interested in knowing: increase of code coverage and increase of reachability.

Example of runtime coverage improvements

The following assumes you’re in the oss-fuzz directory as generated above and that the virtual environment is activated.

In the context of code coverage, we will run the exact same project twice using a different amount of seconds for the fuzzers.

# Generate an introspector report based on 10 seconds of runtime.
# Then save the generated JSON file and clean up.
python3 infra/helper.py introspector htslib --seconds=10
cp ./build/out/htslib/inspector/summary.json summary_first_run.json
sudo rm -rf ./build

# Do another run for 300 seconds
python3 infra/helper.py introspector htslib --seconds=300
cp ./build/out/htslib/inspector/summary.json summary_second_run.json

At this point we have two .json files with data from two Fuzz Introspector runs. The difference between the Fuzz Introspector runs is that one is based on a corpus generated over 10 seconds and the other is based on a corpus generated over 300 seconds.

We can now compare the two runs using the Fuzz Introspector diff command:

$ python3 ../../src/main.py diff \
  --report1 ./summary_first_run.json \
  --report2 ./summary_second_run.json

INFO:__main__:Running fuzz introspector post-processing
Report 2 has similar Total complexity to report 1 - {report 1: 16627 / report 2: 16627})

## Code coverge comparison
The following functions report 2 has decreased code coverage:
Report 2 has less coverage {  64.0 vs   46.0} for bcf_hdr_read

The following functions report 2 has increased code coverage:
Report 2 has more coverage {   0.0 vs   60.0} for sam_hrecs_find_key
Report 2 has more coverage {   0.0 vs  100.0} for TYPEKEY
Report 2 has more coverage {   0.0 vs  78.04} for kh_put_sam_hrecs_t
Report 2 has more coverage {   0.0 vs   80.0} for sam_hrecs_global_list_add
Report 2 has more coverage {   0.0 vs   29.8} for sam_hrecs_update_hashes
Report 2 has more coverage {   0.0 vs  54.83} for kh_resize_sam_hrecs_t
Report 2 has more coverage {   0.0 vs  100.0} for isalpha_c
Report 2 has more coverage {   0.0 vs   87.5} for sam_hrecs_error
Report 2 has more coverage {  40.0 vs   60.0} for sam_hdr_fill_hrecs
Report 2 has more coverage {   0.0 vs  100.0} for redact_header_text
Report 2 has more coverage { 80.64 vs  83.87} for sam_hrecs_free
Report 2 has more coverage {   0.0 vs  92.23} for sam_hrecs_parse_lines
Report 2 has more coverage {   0.0 vs  19.35} for sam_hdr_update_target_arrays
Report 2 has more coverage {   0.0 vs  78.57} for sam_hrecs_rebuild_lines
Report 2 has more coverage {   0.0 vs  100.0} for build_header_line
Report 2 has more coverage { 34.28 vs  37.14} for sam_hdr_count_lines
Report 2 has more coverage {   0.0 vs  84.21} for sam_hdr_add_lines
Report 2 has more coverage {   0.0 vs  100.0} for ks_release
Report 2 has more coverage { 55.55 vs  88.88} for sam_hrecs_rebuild_text
Report 2 has more coverage { 30.55 vs  52.77} for hseek
Report 2 has more coverage {   0.0 vs  100.0} for hgetc2
Report 2 has more coverage { 66.48 vs  81.86} for hts_detect_format2
Report 2 has more coverage {   0.0 vs  100.0} for decompress_peek_gz
Report 2 has more coverage {   0.0 vs   65.0} for parse_version
Report 2 has more coverage {   0.0 vs  68.29} for hts_resize_array_
Report 2 has more coverage { 67.92 vs  92.45} for hts_close
Report 2 has more coverage {  76.1 vs   82.3} for hts_hopen
Report 2 has more coverage {   0.0 vs  100.0} for kh_destroy_s2i
Report 2 has more coverage {   0.0 vs  100.0} for kh_init_s2i
Report 2 has more coverage {   0.0 vs  34.84} for sam_parse1
Report 2 has more coverage {   0.0 vs  66.66} for possibly_expand_bam_data
Report 2 has more coverage {   0.0 vs  100.0} for parse_sam_flag
Report 2 has more coverage {   0.0 vs  46.42} for hts_str2uint
Report 2 has more coverage {   0.0 vs  100.0} for known_stderr
Report 2 has more coverage {   0.0 vs  100.0} for valid_sam_header_type
Report 2 has more coverage {   0.0 vs  100.0} for warn_if_known_stderr
Report 2 has more coverage { 57.74 vs  63.38} for sam_format1_append
Report 2 has more coverage { 54.91 vs  67.21} for fastq_parse1
...
...

The output of the diff command shows us the difference achieved, namely, that for larger amounts of functions the second report (with the longer run) has more code coverage.

Example of reachability differences

In the context of reachability we need more effort than simply running the same project twice with a different number of seconds (as done in Example of runtime coverage improvements). In order to display reachability differences, we need to change the actual code, as the reachability analysis is based on static analysis.

To display reachability differences we will use the libarchive OSS-Fuzz integration. We will first run it with a limited version of the setup, and then run it with the full version of the setup.

First, comment out the lines at https://github.com/google/oss-fuzz/blob/a8cb9370f0dddf33111b1a7ce6d715633d5400df/projects/libarchive/libarchive_fuzzer.cc#L39-L73 Then, we build the introspector report using a 1 second runtime:

# Generate an introspector report based on 1 second runtime with our
# modified libarchive fuzzer.
python3 infra/helper.py introspector libarchive --seconds=1
cp ./build/out/libarchive/inspector/summary.json libarchive_first_run.json
sudo rm -rf ./build

Then, we remove the comments from above so we have the original fuzzer, and do a similar run:

python3 infra/helper.py introspector libarchive --seconds=1
cp ./build/out/libarchive/inspector/summary.json libarchive_second_run.json

At this point we have collected the two reports, each with different fuzzers. We now run our diff command on the two reports:

$ python3 ../../src/main.py diff \
  --report1 ./libarchive_first_run.json \
  --report2 ./libarchive_second_run.json

 INFO:__main__:Running fuzz introspector post-processing
 Report 2 has a larger Total complexity than report 1 - {report 1: 9763 / report 2: 9787})

 ## Code coverge comparison
 ...
 ...

 ## Reachability comparison
 The following functions are only reachable in report 1:
 - All functions reachable in report 1 are reachable in report 2

 The following functions are only reachable in report 2:
 archive_read_data
 mbrtowc
 get_current_oemcp
 default_iconv_charset
 nl_langinfo
 get_current_codepage
 archive_string_conversion_from_charset
 archive_strncpy_l
 free_sconv_object
 archive_wstring_append_from_mbs
 iconv_close
 archive_strncat_l
 utf16nbytes
 mbsnbytes
 get_current_charset
 archive_mstring_get_mbs
 archive_mstring_get_wcs
 archive_mstring_get_utf8
 archive_string_conversion_to_charset
 archive_read_data_block
 archive_read_next_header
 archive_entry_digest
 archive_entry_is_encrypted
 archive_entry_is_metadata_encrypted
 archive_entry_is_data_encrypted
 archive_entry_uid
 archive_entry_size
 gnu_dev_makedev
 archive_entry_pathname_w
 archive_entry_pathname_utf8
 archive_entry_pathname
 archive_entry_mtime
 archive_entry_gid
 archive_entry_filetype
 archive_entry_dev
 archive_entry_ctime
 archive_entry_birthtime
 archive_entry_atime
 INFO:__main__:Ending fuzz introspector post-processing

We can observe that indeed a lot more functions are reachable in the run, which is the verison of the fuzzer that has no code commented out. Furthermore, we notice that many of the functions that are reachable in the second report correspond to functions that we commented out in the first run.