Fuzz Testing

What Is Fuzz Testing?

Fuzz testing (or fuzzing) is an automated testing technique that feeds random, malformed, unexpected, or edge-case data to a program’s inputs and monitors the result for crashes, hangs, memory corruption, unhandled exceptions, and security vulnerabilities. Instead of testing known inputs with known expected outputs, fuzzing explores the vast space of possible inputs that developers never thought to test.

Fuzzing was first developed by Barton Miller at the University of Wisconsin in 1989, when he discovered that sending random characters to Unix command-line utilities caused a surprising number of them to crash. Since then, fuzzing has become one of the most effective techniques for finding security vulnerabilities. Google’s OSS-Fuzz project has found over 10,000 bugs in critical open-source software, including OpenSSL, SQLite, and the Linux kernel. Microsoft runs continuous fuzzing on Windows and Office products. Nearly every major security vulnerability disclosed in the past decade — from Heartbleed to Shellshock — could have been or was found through fuzzing.

There are three main categories of fuzzing. Dumb fuzzing generates purely random data with no understanding of the input format. Smart fuzzing (or generation-based fuzzing) uses knowledge of the input format (protocol specifications, file format grammars) to generate structurally valid but semantically unusual inputs. Coverage-guided fuzzing (used by tools like AFL, libFuzzer, and Jazzer) instruments the target program and uses code coverage feedback to evolve inputs that explore new code paths, combining the thoroughness of random testing with the efficiency of guided exploration.

How It Works

A fuzzer operates in a continuous loop: generate an input, feed it to the target function, observe the result, and repeat. Coverage-guided fuzzers add an intelligence layer by tracking which inputs exercise new code paths and mutating those inputs to explore further.

Here is a simple example using Jest to fuzz-test a JSON parser in JavaScript:

// json-parser.fuzz.test.js
const { parseConfig } = require("./config-parser");

function generateRandomString(maxLength) {
  const length = Math.floor(Math.random() * maxLength);
  const chars = '{}[]":,0123456789abcdefnull true false \n\t\\';
  let result = "";
  for (let i = 0; i < length; i++) {
    result += chars[Math.floor(Math.random() * chars.length)];
  }
  return result;
}

describe("Config Parser Fuzz Tests", () => {
  test.each(Array.from({ length: 1000 }, (_, i) => [i]))(
    "does not crash on random input %i",
    () => {
      const randomInput = generateRandomString(500);
      // The function should either return a valid result
      // or throw a controlled error — never crash
      expect(() => {
        try {
          parseConfig(randomInput);
        } catch (e) {
          // Controlled exceptions are fine
          if (!(e instanceof SyntaxError || e instanceof TypeError)) {
            throw e; // Unexpected error types are bugs
          }
        }
      }).not.toThrow();
    }
  );
});

In Python, the hypothesis library provides property-based fuzzing:

# test_fuzz.py
from hypothesis import given, settings
from hypothesis import strategies as st
from parser import parse_user_input, sanitize_html

@given(st.text(min_size=0, max_size=10000))
@settings(max_examples=5000)
def test_parse_never_crashes(input_string):
    """The parser should handle any input without crashing."""
    try:
        result = parse_user_input(input_string)
        assert isinstance(result, dict)
    except ValueError:
        pass  # Expected for invalid input

@given(st.text())
def test_sanitize_removes_script_tags(html_input):
    """Sanitized output should never contain script tags."""
    result = sanitize_html(html_input)
    assert "<script" not in result.lower()
    assert "javascript:" not in result.lower()

@given(
    st.integers(min_value=-2**63, max_value=2**63 - 1),
    st.integers(min_value=-2**63, max_value=2**63 - 1),
)
def test_add_is_commutative(a, b):
    """Addition should be commutative for all integer inputs."""
    assert add(a, b) == add(b, a)

For compiled languages, dedicated fuzzing tools like AFL++ and libFuzzer are more powerful:

// fuzz_target.c — for use with libFuzzer
#include <stdint.h>
#include <stdlib.h>
#include "parser.h"

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    char *input = malloc(size + 1);
    memcpy(input, data, size);
    input[size] = '\0';

    parse_message(input);  // Should never crash

    free(input);
    return 0;
}

Coverage-guided fuzzers like AFL++ maintain a corpus of interesting inputs. When a new input triggers a previously unseen code path, it is saved to the corpus and mutated to generate further inputs. Over hours or days of continuous fuzzing, the corpus evolves to cover increasingly deep and unusual code paths.

Why It Matters

Fuzz testing discovers bugs that humans never think to test for. Developers write tests for inputs they expect: valid data, common error cases, and documented edge cases. Fuzzers explore the vast space of inputs that developers do not consider: malformed Unicode, extremely long strings, binary data in text fields, deeply nested structures, integer overflow values, and combinations of special characters.

This category of unexpected-input bugs is the root cause of most security vulnerabilities. Buffer overflows, injection attacks, denial-of-service via crafted inputs, and memory corruption are all caused by programs that behave incorrectly when they receive data outside the expected range. Fuzzing is the most effective automated technique for finding these vulnerabilities.

Fuzzing is also remarkably cost-effective. Once a fuzz test is set up, it runs continuously and autonomously, exploring millions of inputs per second. A single fuzzing campaign running overnight can find bugs that would take manual testers months to discover. The setup cost is low compared to the volume and severity of bugs found.

Best Practices

Fuzz at the input boundary. Target functions that parse external input: API request handlers, file parsers, protocol decoders, and form validators. These are the entry points where untrusted data enters the system.
Use coverage-guided fuzzers when possible. Coverage-guided tools like AFL++, libFuzzer, and Jazzer are dramatically more effective than random fuzzers because they intelligently evolve inputs to explore new code paths.
Run fuzzing continuously. Fuzzing is most effective over time. Set up continuous fuzzing in CI that runs for hours or days, not just minutes. Services like OSS-Fuzz provide free continuous fuzzing for open-source projects.
Maintain a corpus of interesting inputs. Save inputs that trigger new code paths or failures. Use this corpus as regression tests and as seeds for future fuzzing runs.
Define clear invariants. A fuzz test needs to know what “failure” looks like. Common invariants include: the function should never crash, should never produce output longer than 10x the input, should never leak memory, and should always produce valid output for valid input.

Common Mistakes

Fuzzing only with random data. Pure random data rarely generates inputs that pass initial validation checks, so the fuzzer never reaches deep code paths. Use smart fuzzing or provide a seed corpus of valid inputs that the fuzzer can mutate.
Running fuzzing for only a few minutes. Many bugs are only found after millions of iterations. Short fuzzing runs provide little value. Allocate hours or days of compute time for effective fuzzing.
Ignoring the bugs fuzzing finds. Fuzzing often finds crashes in error handling code that developers dismiss as “that would never happen in production.” These are exactly the bugs that attackers exploit. Every crash found by a fuzzer should be investigated and fixed.
Not fuzzing after code changes. A function that was safe before a code change might become vulnerable after. Include fuzzing in CI or run it after significant changes to input-handling code.