Compare introspector reports ============================ Fuzz Introspector comes with a ``diff`` command to differentiate two Fuzz Introspector runs. This is achieved by way of the ``summary.json`` file which Fuzz Introspector produces, and holds a lot of the data Fuzz Introspector generates in json format. The ``diff`` command simply takes two ``summary.json`` files as arguments, and will highlight important differences, e.g. coverage and reachability, between the reports. This user guide will show how to use the ``diff`` command to compare two fuzzing reports. This is used when assessing whether any regression or improvements has happened, for example while developing extensions to a given fuzzing suite. The current diffing supported by Fuzz Introspector is not integrated into OSS-Fuzz yet. As such, to use this feature you need to use a local recent version of Fuzz Introspector. For the sake of completeness we will assume the set up achieved using the following commands: .. code-block:: bash # Get a local Fuzz Introspector code, and install Python dependencies. git clone https://github.com/ossf/fuzz-introspector cd fuzz-introspector git submodule init git submodule update python3 -m virtualenv .venv . .venv/bin/activate pip3 install -r ./requirements.txt # Clone OSS-Fuzz cd oss_fuzz_integration ./build_post_processing.sh cd oss-fuzz The following assume you're in the oss-fuzz directory as at the end of the commands above. The next step is to generate two Fuzz Introspector reports, where there will be a difference in the results in the report. There are two different results that we're often interested in knowing: increase of code coverage and increase of reachability. Example of runtime coverage improvements ---------------------------------------- The following assumes you're in the ``oss-fuzz`` directory as generated above and that the virtual environment is activated. In the context of code coverage, we will run the exact same project twice using a different amount of seconds for the fuzzers. .. code-block:: bash # Generate an introspector report based on 10 seconds of runtime. # Then save the generated JSON file and clean up. python3 infra/helper.py introspector htslib --seconds=10 cp ./build/out/htslib/inspector/summary.json summary_first_run.json sudo rm -rf ./build # Do another run for 300 seconds python3 infra/helper.py introspector htslib --seconds=300 cp ./build/out/htslib/inspector/summary.json summary_second_run.json At this point we have two ``.json`` files with data from two Fuzz Introspector runs. The difference between the Fuzz Introspector runs is that one is based on a corpus generated over 10 seconds and the other is based on a corpus generated over 300 seconds. We can now compare the two runs using the Fuzz Introspector ``diff`` command: .. code-block:: console $ python3 ../../src/main.py diff \ --report1 ./summary_first_run.json \ --report2 ./summary_second_run.json INFO:__main__:Running fuzz introspector post-processing Report 2 has similar Total complexity to report 1 - {report 1: 16627 / report 2: 16627}) ## Code coverge comparison The following functions report 2 has decreased code coverage: Report 2 has less coverage { 64.0 vs 46.0} for bcf_hdr_read The following functions report 2 has increased code coverage: Report 2 has more coverage { 0.0 vs 60.0} for sam_hrecs_find_key Report 2 has more coverage { 0.0 vs 100.0} for TYPEKEY Report 2 has more coverage { 0.0 vs 78.04} for kh_put_sam_hrecs_t Report 2 has more coverage { 0.0 vs 80.0} for sam_hrecs_global_list_add Report 2 has more coverage { 0.0 vs 29.8} for sam_hrecs_update_hashes Report 2 has more coverage { 0.0 vs 54.83} for kh_resize_sam_hrecs_t Report 2 has more coverage { 0.0 vs 100.0} for isalpha_c Report 2 has more coverage { 0.0 vs 87.5} for sam_hrecs_error Report 2 has more coverage { 40.0 vs 60.0} for sam_hdr_fill_hrecs Report 2 has more coverage { 0.0 vs 100.0} for redact_header_text Report 2 has more coverage { 80.64 vs 83.87} for sam_hrecs_free Report 2 has more coverage { 0.0 vs 92.23} for sam_hrecs_parse_lines Report 2 has more coverage { 0.0 vs 19.35} for sam_hdr_update_target_arrays Report 2 has more coverage { 0.0 vs 78.57} for sam_hrecs_rebuild_lines Report 2 has more coverage { 0.0 vs 100.0} for build_header_line Report 2 has more coverage { 34.28 vs 37.14} for sam_hdr_count_lines Report 2 has more coverage { 0.0 vs 84.21} for sam_hdr_add_lines Report 2 has more coverage { 0.0 vs 100.0} for ks_release Report 2 has more coverage { 55.55 vs 88.88} for sam_hrecs_rebuild_text Report 2 has more coverage { 30.55 vs 52.77} for hseek Report 2 has more coverage { 0.0 vs 100.0} for hgetc2 Report 2 has more coverage { 66.48 vs 81.86} for hts_detect_format2 Report 2 has more coverage { 0.0 vs 100.0} for decompress_peek_gz Report 2 has more coverage { 0.0 vs 65.0} for parse_version Report 2 has more coverage { 0.0 vs 68.29} for hts_resize_array_ Report 2 has more coverage { 67.92 vs 92.45} for hts_close Report 2 has more coverage { 76.1 vs 82.3} for hts_hopen Report 2 has more coverage { 0.0 vs 100.0} for kh_destroy_s2i Report 2 has more coverage { 0.0 vs 100.0} for kh_init_s2i Report 2 has more coverage { 0.0 vs 34.84} for sam_parse1 Report 2 has more coverage { 0.0 vs 66.66} for possibly_expand_bam_data Report 2 has more coverage { 0.0 vs 100.0} for parse_sam_flag Report 2 has more coverage { 0.0 vs 46.42} for hts_str2uint Report 2 has more coverage { 0.0 vs 100.0} for known_stderr Report 2 has more coverage { 0.0 vs 100.0} for valid_sam_header_type Report 2 has more coverage { 0.0 vs 100.0} for warn_if_known_stderr Report 2 has more coverage { 57.74 vs 63.38} for sam_format1_append Report 2 has more coverage { 54.91 vs 67.21} for fastq_parse1 ... ... The output of the ``diff`` command shows us the difference achieved, namely, that for larger amounts of functions the second report (with the longer run) has more code coverage. Example of reachability differences ----------------------------------- In the context of reachability we need more effort than simply running the same project twice with a different number of seconds (as done in :ref:`Example of runtime coverage improvements`). In order to display reachability differences, we need to change the actual code, as the reachability analysis is based on static analysis. To display reachability differences we will use the ``libarchive`` OSS-Fuzz integration. We will first run it with a limited version of the setup, and then run it with the full version of the setup. First, comment out the lines at https://github.com/google/oss-fuzz/blob/a8cb9370f0dddf33111b1a7ce6d715633d5400df/projects/libarchive/libarchive_fuzzer.cc#L39-L73 Then, we build the introspector report using a 1 second runtime: .. code-block:: bash # Generate an introspector report based on 1 second runtime with our # modified libarchive fuzzer. python3 infra/helper.py introspector libarchive --seconds=1 cp ./build/out/libarchive/inspector/summary.json libarchive_first_run.json sudo rm -rf ./build Then, we remove the comments from above so we have the original fuzzer, and do a similar run: .. code-block:: bash python3 infra/helper.py introspector libarchive --seconds=1 cp ./build/out/libarchive/inspector/summary.json libarchive_second_run.json At this point we have collected the two reports, each with different fuzzers. We now run our ``diff`` command on the two reports: .. code-block:: console $ python3 ../../src/main.py diff \ --report1 ./libarchive_first_run.json \ --report2 ./libarchive_second_run.json INFO:__main__:Running fuzz introspector post-processing Report 2 has a larger Total complexity than report 1 - {report 1: 9763 / report 2: 9787}) ## Code coverge comparison ... ... ## Reachability comparison The following functions are only reachable in report 1: - All functions reachable in report 1 are reachable in report 2 The following functions are only reachable in report 2: archive_read_data mbrtowc get_current_oemcp default_iconv_charset nl_langinfo get_current_codepage archive_string_conversion_from_charset archive_strncpy_l free_sconv_object archive_wstring_append_from_mbs iconv_close archive_strncat_l utf16nbytes mbsnbytes get_current_charset archive_mstring_get_mbs archive_mstring_get_wcs archive_mstring_get_utf8 archive_string_conversion_to_charset archive_read_data_block archive_read_next_header archive_entry_digest archive_entry_is_encrypted archive_entry_is_metadata_encrypted archive_entry_is_data_encrypted archive_entry_uid archive_entry_size gnu_dev_makedev archive_entry_pathname_w archive_entry_pathname_utf8 archive_entry_pathname archive_entry_mtime archive_entry_gid archive_entry_filetype archive_entry_dev archive_entry_ctime archive_entry_birthtime archive_entry_atime INFO:__main__:Ending fuzz introspector post-processing We can observe that indeed a lot more functions are reachable in the run, which is the verison of the fuzzer that has no code commented out. Furthermore, we notice that many of the functions that are reachable in the second report correspond to functions that we commented out in the first run.