Research • Dead Code Detectiondead codepythonstatic analysis

I Tested Dead-Code Detection by Sending Cleanup PRs to Mature OSS Repos

Dead-code scanners are easy to demo on toy projects and hard to trust in real Python repositories. We tested Skylos by sending small cleanup PRs to Black, NetworkX, Optuna, mitmproxy, pypdf, beets, and Flagsmith, then tracked what maintainers actually merged.

Skylos team• Product engineering

April 30, 2026

10 min read

About this page

HN and maintainers do not need another synthetic benchmark claim. This article documents a stricter test: whether dead-code findings survive review in mature open-source repositories.

3 notes

Used Skylos output and manual review to select small, low-risk dead-code cleanup candidates.
Submitted cleanup pull requests to mature open-source repositories and treated maintainer review as the validation step.
Verified final merge status and diff counts through GitHub pull request metadata on 2026-04-30.

Quick answer

The useful test for dead-code detection is not whether a scanner prints findings; it is whether maintainers accept the cleanup.
Eight Skylos-assisted cleanup PRs were merged across Black, Flagsmith, pypdf, mitmproxy, NetworkX, Optuna, and beets.
The merged PRs removed 270 lines across 39 files, with only 5 additions in final GitHub diffs.
False positives are normal in dynamic Python; the process must filter framework entrypoints, plugin loading, dynamic dispatch, and exports before opening PRs.

Try it on real code

Run Skylos on your repository

Use Skylos locally first, then wire it into CI once the signal looks right on your framework and code patterns.

Install Skylos See comparison pages

Navigation

Jump to a section

01The Result 02What This Proves, And What It Does Not 03Why Dead-Code Detection Gets Noisy 04The False-Positive Traps We Checked 05Dynamic dispatch 06Public exports 07Framework entrypoints 08Tests and generated paths 09Near-miss names 10What Happened In The Merged PRs 11Black: unused parser helpers 12Flagsmith: framework-shaped dead code 13pypdf: the importance of keeping similar live code 14mitmproxy: console helpers and utility code 15NetworkX: private helpers and unused imports 16Optuna: visualizations and internal helpers 17beets: plugin convenience wrappers 18The Workflow That Made The PRs Reviewable 19Why We Did Not Submit Everything 20Lessons For Dead-Code Tools 211. Precision matters more than volume 222. Framework awareness is not optional 233. Public API must be treated differently 244. Maintainer review is a better test than a synthetic demo 255. The best workflow is conservative 26How To Try This On Your Repo 27The Bottom Line

Dead-code detection is easy to make look good on a toy repository.

Write a function, never call it, run a scanner, print the finding. That demo tells you almost nothing about whether the tool is useful in a real codebase.

Real Python projects are different. They have framework entrypoints, pytest fixtures, plugin loading, dynamic imports, decorators, command-line scripts, optional dependencies, generated modules, and objects accessed through strings. A scanner that ignores those patterns will produce impressive-looking output and then waste maintainers' time.

So we tried a stricter test: find dead code in mature open-source repositories, submit small cleanup pull requests, and see what maintainers actually merge.

This is not a universal benchmark and it is not an endorsement claim. It is a practical test of whether selected findings can survive review outside our own repository.

The Result

Eight Skylos-assisted cleanup PRs were merged across seven mature open-source projects.

Project	PR	Final GitHub diff	What was removed
Black	psf/black#5041	3 files, 0 additions, 24 deletions	unused internal parsing and node helpers
Black	psf/black#5052	3 files, 0 additions, 36 deletions	unused token helpers, parser debug methods, and a stale attribute
Flagsmith	Flagsmith/flagsmith#6953	10 files, 0 additions, 56 deletions	unused exceptions, serializers, response classes, and helper code
pypdf	py-pdf/pypdf#3685	1 file, 0 additions, 4 deletions	unused reverse encoding dictionaries
mitmproxy	mitmproxy/mitmproxy#8136	8 files, 2 additions, 44 deletions	unused console helpers, bit utilities, and stale imports
NetworkX	networkx/networkx#8572	5 files, 1 addition, 31 deletions	an unused private function and unused imports
Optuna	optuna/optuna#6547	5 files, 2 additions, 37 deletions	unused helpers, a method, a constant, and unpacked variables
beets	beetbox/beets#6473	4 files, 0 additions, 38 deletions	unused plugin helpers and a dead database type

Final merged diff across those PRs:

Merged PRs	Files changed	Additions	Deletions	Net change
8	39	5	270	-265

Those numbers are intentionally modest. The goal was not to open giant cleanup PRs. The goal was to find small changes maintainers could review quickly.

What This Proves, And What It Does Not

It proves that selected Skylos-assisted findings can turn into real merged cleanup work in mature repositories.

It does not prove that every finding is correct. It does not prove that Skylos is better than every specialized tool on every repository. It does not mean Black, NetworkX, Optuna, mitmproxy, pypdf, beets, or Flagsmith use or endorse Skylos.

That distinction matters. Static-analysis marketing often overclaims. Maintainer review is useful because it creates an external check, but it is still only one kind of evidence.

The stronger claim is narrower:

If the scanner can produce small dead-code candidates that maintainers accept, it is finding real maintenance debt, not only benchmark artifacts.

Why Dead-Code Detection Gets Noisy

Python makes static dead-code detection difficult because code can be reached without a direct textual call.

The standard library supports programmatic imports through importlib.import_module(). Plugin systems often discover code at runtime through package metadata or entrypoints. Frameworks call functions because a decorator, route table, migration runner, CLI config, or test runner knows about them, not because another Python file contains a normal function call.

The Vulture documentation shows the same problem in a small example: a method reached through getattr() can still be reported as unused, and the recommended fix is a whitelist. That is not a knock on Vulture. It is the central problem every Python dead-code tool has to face.

A practical scanner needs to assume that some findings are wrong until proven otherwise.

The False-Positive Traps We Checked

Before opening a PR, we looked for the common traps that make dead-code findings unsafe.

Dynamic dispatch

A method with no direct call can still be reached through getattr(obj, name), a command registry, a serializer map, a plugin loader, or a framework callback.

This is why a pure "zero textual references" rule is not enough.

Public exports

A private helper with no references is one thing. A public symbol exported through __all__, documented API surface, package metadata, or a compatibility layer is different. Removing public API can be a breaking change even if the repository itself does not call it.

Framework entrypoints

Django serializers, FastAPI routes, pytest fixtures, Celery tasks, Alembic migrations, Click commands, and Pydantic validators can look unused if the scanner does not understand the framework.

Tests and generated paths

Some helpers exist only for tests, docs, examples, optional integrations, or generated files. Those are not always dead. Sometimes they are just outside the first scan path.

Near-miss names

A dead-code candidate can sit next to a live symbol with a similar name. In pypdf, for example, reverse encoding dictionaries looked unused, but the related _pdfdoc_encoding_rev needed to stay because it was still used and exported.

What Happened In The Merged PRs

Black: unused parser helpers

The two Black PRs removed unused parser helpers, token helpers, debug methods, and one stale attribute. The first PR removed matches_grammar(), lib2to3_unparse(), is_function_or_class(), and an unused Deprecated warning class. The second removed unused token helpers, parser debug functions, and a stale was_checked attribute.

The useful lesson from Black was not that the codebase was messy. It is not. The lesson is that even tightly maintained projects can retain internal helpers after refactors.

Flagsmith: framework-shaped dead code

Flagsmith was a good test because framework-heavy application code creates more false-positive risk than a small library. The merged PR removed unused exception classes, serializers, response classes, a Pydantic model, and a helper function.

This is the kind of code that often survives because it is harmless: no tests fail, no user sees it, and nobody wants to manually audit every serializer and exception class after a feature changes.

pypdf: the importance of keeping similar live code

pypdf had several reverse encoding dictionaries that were unused. The cleanup was small, but the important part was restraint: a similar dictionary, _pdfdoc_encoding_rev, was kept because it was still used.

That is exactly the kind of near-miss a dead-code workflow has to catch before a PR is opened.

mitmproxy: console helpers and utility code

The mitmproxy PR removed old console helpers, bit utility code, and stale imports. The final diff included 2 additions because not every first-pass removal survived review. That is normal. A useful workflow should expect maintainer feedback and keep the diff conservative.

NetworkX: private helpers and unused imports

NetworkX merged removal of one unused private function and several unused imports. The PR also restored one candidate after maintainer feedback pointed to a broader area that deserved separate review.

That is a good outcome. The purpose of the tool is not to bulldoze code; it is to surface candidates maintainers can reason about.

Optuna: visualizations and internal helpers

Optuna removed unused distribution and visualization helpers, an unused method on a label encoder, an unused constant, and unused unpacked variables. In the PR log, the initial scan had 28 findings and 11 survived after filtering out 17 false positives.

That ratio is important. It shows why raw finding counts are not the product. The product is the filtered set a maintainer can trust.

beets: plugin convenience wrappers

The beets PR removed unused ListenBrainz helper wrappers, an unused static helper, a superseded MusicBrainz collection helper, and a database type that was never instantiated.

Plugin-heavy repositories are exactly where naive dead-code detection can become noisy, so the PR had to stay small and specific.

The Workflow That Made The PRs Reviewable

The rough process was:

Run the scanner.
Group findings by risk.
Check direct references.
Check exports and package metadata.
Check framework and plugin entrypoints.
Remove only candidates that still looked dead.
Keep PRs small enough for a maintainer to review.
Treat maintainer feedback as part of the validation loop.

That last step matters. If a maintainer says a symbol is still part of public API, the correct response is to restore it, not argue that the graph says zero references.

Why We Did Not Submit Everything

Some scans produced many more findings than the final PRs. That is expected.

For example, the PR log for Optuna records 28 findings, with 11 confirmed after removing 17 false positives. The pending Celery work started from 300 findings, narrowed to 56 verified true positives, and then selected a much smaller PR-sized subset.

Raw findings are useful for exploration. They are not automatically useful as pull requests.

For open source maintainers, the right unit is a small reviewable diff:

no broad rewrites,
no public API removals unless maintainers agree,
no changes that require trust in a tool,
no cleanup mixed with style churn,
clear explanation of why each symbol appears unused.

Lessons For Dead-Code Tools

1. Precision matters more than volume

A tool that prints 500 findings and 300 are questionable will not be trusted. A tool that prints fewer findings but explains why they are likely safe to remove is more useful.

2. Framework awareness is not optional

Python applications do not call everything directly. Static analysis has to understand decorators, config files, entrypoints, tests, migrations, serializers, and plugin conventions.

3. Public API must be treated differently

Internal dead code and public unused code are not the same thing. A library may intentionally keep a symbol for downstream users even if its own tests do not reference it.

4. Maintainer review is a better test than a synthetic demo

A benchmark can tell you whether a detector catches expected cases. A merged cleanup PR tells you whether the output was useful to someone who owns the code.

5. The best workflow is conservative

The goal is not to delete the most code. The goal is to delete code that is actually safe to remove.

How To Try This On Your Repo

Install Skylos and run a local scan:

pip install skylos
skylos .

For a PR gate, start with changed-code review instead of blocking the whole repository at once:

skylos . --diff origin/main

If you are evaluating dead-code findings, do not start by deleting everything. Start with a small branch and ask:

Is the symbol private or public API?
Is it exported through __all__, package metadata, or docs?
Is it registered through a framework, CLI, test runner, migration system, or plugin entrypoint?
Is it called through getattr(), importlib, a registry, or string dispatch?
Does removing it keep tests green?
Would a maintainer understand the diff in under five minutes?

If the answer is unclear, keep the code or mark it for manual review.

The Bottom Line

Dead-code detection should not be judged by the biggest number in a terminal table.

The useful question is simpler:

Can the tool produce cleanup work that survives review in real repositories?

For these eight PRs, the answer was yes. That is not the end of the argument, but it is a stronger signal than another toy benchmark.

If you want to inspect the proof directly, the merged PR list is here: Real-world Skylos results.

Frequently asked questions

Does this mean Black, NetworkX, Optuna, or the other projects use Skylos?

No. The linked PRs are maintainer-merged cleanup PRs assisted by Skylos findings. They are not endorsements, adoption claims, or usage claims by those projects.

Does this prove Skylos is better than Vulture or every other dead-code tool?

No. It proves a narrower point: selected Skylos-assisted findings survived maintainer review in mature repositories. Tool comparisons still need reproducible benchmarks and careful false-positive accounting.

Why not submit every finding as a PR?

Because static analysis on dynamic Python produces false positives. We filtered findings down to small, reviewable diffs where references, exports, framework entrypoints, and dynamic usage were checked first.

Try it on real code

Run Skylos on your repository

Use Skylos locally first, then wire it into CI once the signal looks right on your framework and code patterns.

Install Skylos See comparison pages

Continue exploring

I Tested Dead-Code Detection by Sending Cleanup PRs to Mature OSS Repos

Run Skylos on your repository

The Result

What This Proves, And What It Does Not

Why Dead-Code Detection Gets Noisy

The False-Positive Traps We Checked

Dynamic dispatch

Public exports

Framework entrypoints

Tests and generated paths

Near-miss names

What Happened In The Merged PRs

Black: unused parser helpers

Flagsmith: framework-shaped dead code

pypdf: the importance of keeping similar live code

mitmproxy: console helpers and utility code

NetworkX: private helpers and unused imports

Optuna: visualizations and internal helpers

beets: plugin convenience wrappers

The Workflow That Made The PRs Reviewable

Why We Did Not Submit Everything

Lessons For Dead-Code Tools

1. Precision matters more than volume

2. Framework awareness is not optional

3. Public API must be treated differently

4. Maintainer review is a better test than a synthetic demo

5. The best workflow is conservative

How To Try This On Your Repo

The Bottom Line

Frequently asked questions

Run Skylos on your repository

Related reading

3 Merged PRs: Dead Code We Found in Black, Flagsmith, and pypdf

Finding Dead Code in Flask (71k Stars): Skylos vs Vulture Benchmark

Dead Code in Python Isn't Just Tech Debt — It's a Security Liability