Back to blog
case studyflaskdead codepythonbenchmarks

Case study: Finding dead code in Flask (69k stars)

6 min read

Case study: Finding dead code in Flask (69k stars)

Flask is one of the most popular Python web frameworks — 69,000+ stars, used everywhere from startups to Fortune 500s. It's also a great test case for dead code detection because it uses every pattern that trips up static analyzers: decorators, fixtures, class attribute overrides, dynamic registration, and callback functions that are never "called" in the traditional sense.

We ran both Skylos and Vulture against the Flask repository to see how they handle a real-world, framework-heavy codebase.

TL;DR

MetricSkylosVulture
Dead code found7/7 (100%)6/7 (85.7%)
False positives12260
Precision36.8%2.3%
F1 Score53.8%4.4%

Skylos found every dead item. Vulture missed one and produced 21x more false positives.


Setup

We cloned the Flask repository and ran both tools at confidence level 20 (aggressive mode — catches more but also flags more). Every finding was manually verified against the source code. No automated labelling.

# Clone Flask
git clone https://github.com/pallets/flask
cd flask

# Run Skylos
skylos src/flask/ tests/ --json --confidence 20

# Run Vulture
vulture src/flask/ tests/ --min-confidence 20

Ground truth: 7 genuinely dead items and 261 confirmed-alive items that a tool might mistakenly flag.

What's actually dead in Flask?

After manual verification, we found 7 items with zero callers or importers anywhere in the repository:

ItemTypeWhat it is
cli.py:_path_is_ancestorFunctionInternal utility, never called
templating.py:_srcobjVariableUnused loop variable
typing.py:BeforeFirstRequestCallableVariableType alias defined but never referenced
debughelpers.py:UnexpectedUnicodeErrorClassException never raised or caught
test_helpers.py:PyBytesIOClassTest helper never instantiated
multiapp.py:app1VariableTest app variable never imported
multiapp.py:app2VariableTest app variable never imported

Both tools found most of these. But only Skylos found all 7 — Vulture missed _srcobj, the unused loop variable in templating.py.

Where Vulture falls apart: false positives

This is where the gap becomes dramatic.

Vulture flagged 260 items that are actually used. Skylos flagged 12. That's a 21x difference.

Why does Vulture produce so many false positives?

Flask decorator callbacks. Flask registers route handlers and error handlers via decorators like @app.route(), @app.errorhandler(), and @app.before_request. The decorated function is never "called" directly — Flask's internal dispatch system invokes it. Vulture doesn't understand this pattern and flags every single one as unused.

# Vulture says this is unused. It's not — Flask calls it on every request.
@app.before_request
def before_request1():
    ...

Skylos recognizes Flask decorator patterns and correctly marks these as used.

Pytest fixtures. Flask's test suite uses @pytest.fixture extensively. Fixtures are injected by name, not by direct call. Vulture flags all of them.

# Vulture says this is unused. pytest injects it by argument name.
@pytest.fixture
def leak_detector():
    ...

Skylos recognizes pytest fixture patterns and skips them.

Class attribute overrides. Flask subclasses Werkzeug classes and overrides attributes like default_mimetype, json_module, etc. These are read by the parent class, never referenced directly in Flask's own code. Vulture flags them. Skylos flags some but not all — still a gap, but far fewer.

Type checking test files. Flask has test files specifically for mypy/pyright type validation. The functions in these files exist to be type-checked, not called. Vulture flags all of them. Skylos skips them.

The false positive breakdown

CategorySkylos FPVulture FP
Flask decorator callbacks (routes, error handlers, hooks)0183
Pytest fixtures08
Public API / class attributes821
Config variables (loaded via from_object)44
Type checking test files025
CLI / blueprint registration04
Other test patterns015
Total12260

The biggest category — Flask decorator callbacks — accounts for 183 of Vulture's 260 false positives. Skylos produces zero false positives in that category because it understands Flask's decorator registration model.

What this means in practice

If you ran Vulture on Flask and tried to action the results, you'd spend your time reviewing 260 "unused" items, 97.7% of which are false positives. After the first 20 or so, you'd stop trusting the tool and ignore everything — including the 6 real findings.

With Skylos, you get 19 total findings. 7 are real dead code, 12 are false positives from class attribute overrides and config variables. You can review all 19 in a few minutes and action the 7 real ones.

The dead code Skylos found

Here's what you'd clean up:

_path_is_ancestor in cli.py — An internal utility function that was probably used at some point during development but is no longer called. Safe to delete.

_srcobj in templating.py — An unused loop variable. The loop unpacks two values but only uses the second. Can be replaced with _.

BeforeFirstRequestCallable in typing.py — A type alias that was defined for the deprecated before_first_request feature. The feature was removed but the type alias stayed behind. Classic dead code pattern.

UnexpectedUnicodeError in debughelpers.py — An exception class that is never raised, caught, or imported anywhere. Likely from an earlier version of Flask's debug system.

PyBytesIO in test_helpers.py — A test helper class that was never instantiated in any test.

app1 and app2 in multiapp.py — Test app variables defined for CLI testing but never actually imported by the test suite.

Reproduce it yourself

git clone https://github.com/duriantaco/skylos-demo
cd skylos-demo/real_life_examples/flask
python3 ../benchmark_flask.py

The benchmark script runs both tools and compares results against manually verified ground truth. Every finding is documented.

Takeaway

Dead code detection on framework-heavy Python codebases requires understanding how the framework works. Pattern-matching tools like Vulture will drown you in false positives because they can't distinguish between "this function is never called" and "this function is called by the framework's internal dispatch."

Skylos isn't perfect — it still produces false positives on class attribute overrides. But on Flask, it found 100% of dead code with 21x fewer false positives than Vulture.

The gap is even wider on codebases that use multiple frameworks (Flask + pytest + Celery, for example). The more "magic" your code uses, the more framework awareness matters.


Want to try it on your own codebase? Install Skylos — it's free, open source, and runs 100% locally.