Static analysis tools produce findings. The question is always the same: are the findings real?
Benchmarks help. Comparisons against other tools help. But the hardest test is submitting a pull request to a well-maintained open source project and seeing if the maintainer merges it or closes it.
We did that three times. All three were merged.
The PRs
| Project | Stars | PR | What was removed | Lines |
|---|---|---|---|---|
| Black | 39k+ | #5041 | 4 unused functions and classes | -24 |
| Flagsmith | 5k+ | #6953 | 10 unused classes, serializers, and exceptions | -56 |
| pypdf | 8k+ | #3685 | 5 unused constants and classes | -12 |
Total: 19 dead code findings removed, 92 lines deleted, 15 files changed, 0 regressions.
Every PR passed CI. No functionality broke. The code was genuinely dead.
What Skylos Found
Black: Ghost Functions in the Parser
Black is the most widely used Python formatter. Its codebase is tight and well-reviewed. We still found four things:
matches_grammar()inparsing.py— a utility function with zero callers anywhere in the codebaselib2to3_unparse()inparsing.py— another utility function, also zero callersis_function_or_class()innodes.py— a predicate function that nothing callsDeprecatedinmode.py— a warning class that was never raised or referenced
These are classic dead code patterns: helper functions that were written for a feature, the feature changed, and the helpers stayed behind. Grep confirms zero references. The maintainer (JelleZijlstra) merged it the same day.
The only feedback was to remove the CHANGES.md entry — Black doesn't document internal cleanup in the changelog.
Flagsmith: 10 Unused Classes Across the API
Flagsmith is a feature flag platform. Their Python API had more dead code than expected — 10 items across 10 files:
WebhookSendError— an exception class that was never raised or caught. The entire file was deleted.WebhookURLSerializer— a serializer that no view ever usedViewResponseDoesNotHaveStatus— another exception, never raisedImproperlyConfiguredError— never raised. File deleted.IdentitySerializerFull— a serializer with no referencesBaseDetailedPermissionsSerializer— never imported by anythingUTMDataModel— a Pydantic model that was never instantiatedDispatchResponse— a response class that was never returnedOAuthError— an exception that was never raisedget_next_segment_priority()— a function with zero callers
The pattern here is common in Django codebases: serializers, exceptions, and utility functions get created for features that evolve or get rewritten, and the old pieces stay behind because nothing breaks when dead code exists. It just sits there adding to grep noise and cognitive load.
Merged by matthewelwell within a day.
pypdf: Reverse Encoding Dicts That Nothing Used
pypdf is a pure-Python PDF library. The findings were small but clean:
FieldFlaginconstants.py— anIntFlagclass implementing Table 8.70 of the PDF spec. Fully defined, never imported or used anywhere.- Four
_*_encoding_revdictionaries in_codecs/__init__.py— reverse lookup dicts for Windows, Mac, Symbol, and ZapfDingbats encodings. All four were unused. (The similar_pdfdoc_encoding_revwas kept — it's used ingeneric/_base.pyand exported in__all__.)
The maintainer (stefan6419846) merged it quickly. No discussion needed — the code was clearly dead.
How We Found Them
The workflow was two steps:
Step 1: Static scan. We ran skylos . -a on each repository. This builds a full reference graph — every definition, every call site, every import — and flags symbols with zero inbound references. Framework-aware rules filter out false positives from Django views, pytest fixtures, FastAPI routes, and similar patterns.
Step 2: LLM verification. We ran skylos agent verify with Anthropic's Claude Sonnet to double-check the static findings. The agent uses a 3-pass architecture:
- Entry point discovery — reads
pyproject.toml, Dockerfiles, and CI configs to understand how the project is used - Batch verification — groups findings, runs multi-strategy grep (method calls, string dispatch,
__all__, type annotations, imports), reads surrounding source code, and asks the LLM to reason about whether each finding is truly dead - Survivor challenge — any finding that only escaped through a heuristic gets a second look
This filters out the false positives. Not everything Skylos flags is dead — some things are called via dynamic dispatch, string lookups, or framework magic. The verification agent catches those.
For the PRs, we only submitted findings where both the static analysis and the LLM verification agreed the code was dead. Conservative, but that's how you get PRs merged instead of closed.
What This Means
Dead code in well-maintained projects isn't a failure of the maintainers. It's a natural consequence of software evolution. Features change, code gets refactored, and the old pieces don't always get cleaned up because:
- Dead code doesn't break tests. It has zero callers, so removing it can't cause a regression.
- Nobody's looking for it. Code review catches bad code, not absent callers.
- Grep is tedious. Manually verifying that a function has zero references across a large codebase is boring work that humans skip.
This is exactly what static analysis tools should handle. Find the candidates, verify them, and hand off a clean PR that maintainers can merge with confidence.
Try It
pip install skylos
skylos . -a
If you want LLM-powered verification on top:
skylos agent verify . --provider anthropic
All three PRs were generated from Skylos output with no manual analysis beyond reading the diffs before submitting. The tool did the finding, the LLM did the verification, and the maintainers confirmed the results.