How to Catch Hallucinated Imports and Phantom Calls in AI-Generated Python Code
Your team is writing more code faster than ever. Copilot, Claude, Cursor, and other AI assistants are generating hundreds of lines per day. Velocity is up.
But so is the rate of a specific category of bugs that barely existed before: code that references things that don't exist.
AI coding tools hallucinate. They generate function calls to functions that aren't in your codebase. They import packages that aren't installed. They reference APIs that don't exist in the version of the library you're using. And unlike a typo or a logic error, these problems often look perfectly correct at first glance.
This guide covers what these patterns look like, why they're dangerous, and how to catch them automatically.
What AI coding tools get wrong
Hallucinated imports
The AI writes an import statement for a package that doesn't exist or isn't installed in your environment:
from cryptoutils import secure_hash # this package doesn't exist
from flask_permissions import require # also doesn't exist
import pandas_profiling # deprecated, renamed to ydata-profiling
This is more dangerous than it sounds. If the AI generates from cryptoutils import secure_hash and you pip install cryptoutils to make it work, you might be installing a typosquatted package — a malicious package uploaded to PyPI under a plausible-sounding name.
In 2025 alone, PyPI removed hundreds of typosquatted packages targeting common import patterns. An AI-generated import pointing to a nonexistent package is essentially a supply chain attack waiting to happen.
Phantom function calls
The AI generates code that calls functions which don't exist anywhere in your project:
def process_order(order):
validated = validate_order_schema(order) # doesn't exist
enriched = enrich_with_inventory(validated) # doesn't exist
result = submit_to_payment_gateway(enriched) # doesn't exist
send_order_confirmation(result) # doesn't exist
return result
Each of these looks like a reasonable function name. In a code review, especially a quick one, you might not notice that none of them are defined anywhere. The AI wrote the orchestration layer but forgot to write (or reference) the actual implementations.
Fake API methods
The AI generates calls to methods that don't exist on the object:
import boto3
s3 = boto3.client('s3')
# This method doesn't exist — the real one is list_objects_v2
objects = s3.list_all_objects(Bucket='my-bucket')
Or uses arguments that the function doesn't accept:
from flask import Flask
app = Flask(__name__)
app.run(host='0.0.0.0', port=8080, workers=4) # Flask doesn't have a 'workers' param
Hardcoded credentials
AI tools frequently inline secrets during development:
client = openai.OpenAI(api_key="sk-proj-1234567890abcdef")
db = psycopg2.connect(
host="prod-db.internal.company.com",
password="SuperSecret123!"
)
The AI was trying to be helpful by filling in a complete, working example. But that API key or password is now in your source code, ready to be committed.
Disabled security controls
AI-generated code often takes shortcuts with security:
import requests
response = requests.get(url, verify=False) # SSL verification disabled
import jwt
decoded = jwt.decode(token, algorithms=["none"]) # JWT verification disabled
app = Flask(__name__)
app.config['DEBUG'] = True # Debug mode in production
These work during development. They also create vulnerabilities in production.
Why traditional tools miss these
Linters catch syntax, not semantics
Pylint and Flake8 will catch an unused import or an undefined variable. But they won't catch:
- A function call to a function defined in another file that was never written
- An import of a package that exists on PyPI but isn't in your requirements.txt
- An API method that looks valid but doesn't exist in the version you're using
Security scanners focus on known patterns
Bandit and Semgrep look for known vulnerability patterns — eval(), pickle.loads(), SQL injection via string formatting. They don't look for "this function call goes nowhere" or "this import resolves to a package you've never installed."
Code review doesn't scale
A careful code reviewer would catch phantom function calls. But:
- AI tools generate code fast — faster than reviewers can review
- The generated code often looks plausible and well-structured
- Reviewers focus on logic, not on verifying every import and function reference
- In a team of 10, you'd need to catch these across dozens of PRs per day
How to detect AI code problems automatically
Skylos discover: find LLM integration points
Skylos includes a discovery module that maps where your codebase integrates with LLM APIs:
skylos discover src/
This scans your code for:
- Direct API calls to OpenAI, Anthropic, Cohere, and other LLM providers
- Framework integrations (LangChain, LlamaIndex, Haystack)
- Prompt construction patterns
- RAG pipeline components
Output example:
LLM Integration Discovery
──────────────────────────
Found 3 LLM integration points:
src/chat.py:12 openai.ChatCompletion.create()
src/rag.py:34 RetrievalQA.from_chain_type()
src/prompts.py:8 f-string prompt with user input
This tells you where AI-related code exists in your project so you can focus security review on those areas.
Skylos defend: score AI code security
Once you know where AI integrations live, assess their security posture:
skylos defend src/
Skylos runs 13 security checks across your AI code:
| Check | What it detects |
|---|---|
| Prompt injection | User input concatenated directly into prompts |
| Input validation | Missing length/content validation on LLM inputs |
| Output sanitization | LLM responses used without sanitization |
| RAG context isolation | RAG queries that could leak cross-tenant data |
| PII filtering | Personal data sent to LLM APIs without scrubbing |
| Rate limiting | Missing rate limits on LLM endpoints |
| Cost controls | No token/spend limits on API calls |
| Logging | LLM interactions not logged for audit |
Each integration point gets a defense score out of 100:
Defense Score: 45/100
Missing controls:
✗ No input validation on user prompts
✗ No output sanitization
✗ No rate limiting on /chat endpoint
✓ API key stored in environment variable
✓ Logging present
Skylos standard analysis: dead code + security
For the broader AI-generated code problems (phantom calls, hallucinated imports, hardcoded secrets), Skylos's standard analysis covers them:
skylos src/ --danger
This catches:
Hardcoded credentials (SKY-D201):
src/config.py:7 [HIGH] Hardcoded credential: API_KEY = "sk-proj-..."
Command injection from user input (SKY-D212):
src/tools.py:23 [HIGH] Command injection: user input flows to subprocess.call()
Weak cryptography (SKY-D214):
src/auth.py:15 [MEDIUM] Weak hash: hashlib.md5() used for password hashing
Disabled SSL verification (SKY-D220):
src/api.py:9 [MEDIUM] SSL verification disabled: verify=False
Unused functions (dead code from abandoned AI drafts):
src/utils.py:45 unused function generate_report_v1
src/utils.py:89 unused function _format_legacy_response
src/helpers.py:12 unused import from pandas_profiling import ProfileReport
Combining all three
For a comprehensive scan of a codebase that uses AI tools:
# Find where LLM integrations exist
skylos discover src/
# Score their security posture
skylos defend src/
# Scan everything for dead code + security + quality
skylos src/ --danger --quality
Running this in CI
Catch AI code problems on every PR
skylos cicd init
This generates a GitHub Actions workflow that:
- Scans every PR for security vulnerabilities, dead code, and quality issues
- Posts inline comments on findings directly in the PR
- Fails the pipeline if critical findings are present
For AI-specific scanning, add defend checks to your workflow:
name: Skylos AI Code Check
on:
pull_request:
branches: [main]
jobs:
scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- run: pip install skylos
- run: skylos . --danger --quality --github
- run: skylos defend . --fail-on high
The --fail-on high flag blocks the PR if any high-severity AI security findings are present.
Pre-commit hook
Catch problems before they even reach the PR:
repos:
- repo: https://github.com/duriantaco/skylos
rev: v3.5.10
hooks:
- id: skylos
args: ['--danger']
Real patterns we've seen
The "helpful" AI that imports malware
An AI assistant suggested:
from python_utils import encryption
The developer installed python-utils from PyPI to make the import work. The actual package had nothing to do with encryption — it was a generic utilities package. But a typosquatted version (python_utils with an underscore) was also on PyPI and contained a reverse shell.
The orchestrator with no implementation
AI generated a clean service layer:
class OrderService:
def create_order(self, data):
validated = OrderValidator.validate(data)
inventory = InventoryService.reserve(validated.items)
payment = PaymentGateway.charge(validated.total)
notification = NotificationService.send_confirmation(payment)
return Order.create(validated, inventory, payment, notification)
None of those classes existed. The code passed review because the logic was sound. It crashed at runtime.
The copy-paste credentials
AI auto-completed a configuration block:
STRIPE_SECRET_KEY = "sk_live_51H7b..."
OPENAI_API_KEY = "sk-proj-abc123..."
DATABASE_URL = "postgresql://admin:password@prod-db:5432/main"
The developer committed it, pushed, and the keys were scraped from the public GitHub repo within minutes.
What this means for your team
If your team uses AI coding tools (and most teams do now), you need automated scanning that catches AI-specific failure modes:
- Hallucinated imports — flag imports of packages not in your requirements
- Phantom function calls — flag calls to functions not defined in your codebase
- Hardcoded secrets — flag credentials that AI tools inline
- Disabled security — flag
verify=False,debug=True,algorithms=['none'] - Dead code accumulation — flag the abandoned drafts that AI tools leave behind
Traditional linters and security scanners catch some of these. None of them catch all of them. That's the gap Skylos fills.
Get started
pip install skylos
# Scan for security + dead code
skylos src/ --danger
# Find LLM integration points
skylos discover src/
# Score AI code security posture
skylos defend src/
Related
- Semgrep vs Skylos for Python
- Best Python SAST Tools in 2026
- How to Detect Dead Code in Python
- Python Security Scanner for GitHub Actions
- AI-Generated Python Code Is Shipping Vulnerabilities
- We Scanned 9 Popular Python Libraries
Skylos is open source. View on GitHub | Docs | Install