- Published on
- ·5 min read
A Method for Finding Missing Unit Tests
- Authors

- Name
- Shahbaz Zaidi
- @_zaidi_shahbaz
The Coverage Illusion
Code coverage is the most common metric for test quality. If 80% of lines are executed during tests, you have 80% coverage. But coverage is a weak measure. A test can execute code without verifying its correctness.
Consider:
def calculate_discount(price, is_member):
if is_member:
return price * 0.9
return price
def test_discount():
result = calculate_discount(100, True)
# No assertion!
This test achieves 100% coverage of the function but verifies nothing. The discount could return price * 0.5 and the test would pass.
The paper addresses a harder question: which tests are missing? Not "is code covered?" but "is behavior verified?"
Mutation Testing Basics
Mutation testing systematically introduces bugs (mutations) into code and checks if tests catch them. A mutant is a modified version of the code with one small change:
- Replace
<with<= - Replace
+with- - Replace
TruewithFalse - Delete a statement
If tests fail when a mutant is introduced, the mutant is "killed." If tests pass, the mutant "survives," indicating a gap in test coverage.
The mutation score measures test effectiveness:
A test suite with 90% mutation score catches 90% of synthetic bugs. This correlates better with real bug detection than line coverage.
The Combinatorial Problem
Mutation testing is computationally expensive. For a codebase with mutation points and possible mutations per point, the total mutants is . Each mutant requires running the full test suite.
For a modest project with 10,000 lines and 5 mutations per line:
The paper proposes techniques to reduce this cost.
Prioritizing Mutations
Not all mutations are equally valuable. A mutation in dead code or error handling for impossible conditions doesn't represent real risk.
The method prioritizes mutations in:
- Code with high cyclomatic complexity (more branches = more logic to verify)
- Recently changed code (more likely to contain bugs)
- Code with low existing test coverage (obvious gaps)
By scoring code regions, mutation testing focuses on high-value areas:
Equivalent Mutants
Some mutations don't change program behavior. These "equivalent mutants" cannot be killed because they're semantically identical to the original.
# Original
for i in range(0, n):
process(i)
# Equivalent mutant
for i in range(0, n, 1):
process(i)
Equivalent mutants pollute mutation scores and waste testing effort. The paper uses static analysis to identify and filter likely equivalents.
Two mutations are equivalent if they produce the same output for all inputs in the program's domain. Determining this precisely is undecidable, but heuristics catch common cases:
- Mutations in unreachable code
- Mutations that cancel out (e.g.,
x + 1 - 1) - Mutations in logging or debug statements
Test Generation Suggestions
When a mutant survives, the method suggests what test is missing. It analyzes:
- Which code path contains the surviving mutant
- What input values reach that path
- What assertion would distinguish mutant from original
For example, if a mutant changes price * 0.9 to price * 0.8, the suggestion might be:
Missing test: verify discount calculation
Input: price=100, is_member=True
Expected: 90
Actual (mutant): 80
This gives developers actionable feedback rather than just "your tests are incomplete."
Incremental Analysis
The paper emphasizes incremental application. Running full mutation analysis on every commit is impractical. Instead:
- On each commit, identify changed functions
- Generate mutations only for changed code
- Run only tests that cover changed code
- Report surviving mutants
This reduces analysis time from hours to minutes while focusing on code most likely to contain new bugs.
Practical Limitations
Mutation testing assumes that small syntactic changes represent realistic bugs. This isn't always true. Real bugs often involve:
- Misunderstood requirements (logic is wrong, not just off-by-one)
- Integration issues (works in isolation, fails in combination)
- Concurrency bugs (timing-dependent, hard to synthesize)
The paper acknowledges these limitations. Mutation testing complements, not replaces, other testing approaches like integration tests, property-based testing, and manual review.
Takeaway
Code coverage answers "did tests run this code?" Mutation testing answers "would tests catch a bug here?" The gap between these questions represents risk.
The paper's contribution is making mutation testing practical through prioritization, equivalent mutant detection, and incremental analysis. These techniques bring mutation testing from research curiosity to viable engineering practice.
Tools like PIT (Java), mutmut (Python), and Stryker (JavaScript) implement these ideas. They're worth running periodically, especially before releases, to find tests you didn't know you were missing.