Use case

ai codesecuritypython

How to Verify AI-Generated Python Code and Catch Hallucinated Imports

Learn how to verify AI-generated Python code before merge. Catch hallucinated imports, phantom calls, hardcoded secrets, and disabled security controls before they reach production.

Skylos team• Product engineering and research

March 12, 2026

9 min read

About this page

This guide focuses on the failure modes we keep seeing in AI-assisted Python work: references to things that do not exist, insecure defaults, and security controls that look plausible but are not really enforced.

3 notes

Built from recurring AI-code review patterns seen in Skylos rules and recent Python security content in the repo.
Examples are intentionally minimal and meant to show detection categories such as hallucinated imports, phantom calls, and hardcoded secrets.
Commands are included so readers can map the guide directly to a local or CI workflow.

Run this workflow on your repo

The guide matters only if it maps to code you already own

Start with one concrete repo and one real workflow. If the results are useful, then wire the same pattern into CI.

Maintainer proof

Merged cleanup PRs into Black, networkx, mitmproxy, pypdf, and Flagsmith.

Benchmark

98.1% recall on 9 repos, with 220 false positives vs Vulture's 644.

Verification

35/35 LLM verification accuracy on pip-tools, tox, and mesa.

Run your first scan Compare tools first

Install with pip install skylos

Run skylos . -a before changing your workflow

Practical answer

Learn how to verify AI-generated Python code before merge. Catch hallucinated imports, phantom calls, hardcoded secrets, and disabled security controls before they reach production.

AI-generated Python bugs often look believable because the names, imports, and flows are plausible even when they do not exist.
Hallucinated imports can turn into supply-chain risk if a developer installs a package just to satisfy the import.
The best place to catch these issues is before merge, not after runtime errors show up in production.

Step 1
Scan for nonexistent references
Check imports, function calls, and object methods that look valid but do not exist in the real project.
Step 2
Review security-sensitive defaults
Look for disabled verification, hardcoded secrets, fake auth checks, and risky convenience code.
Step 3
Add a pre-merge AI review gate
Run static analysis and security checks before AI-generated code reaches the default branch.

How to Verify AI-Generated Python Code and Catch Hallucinated Imports

If your team uses Cursor, Claude, Copilot, or similar tools, you need a repeatable way to verify AI-generated Python code before merge.

Your team is writing more code faster than ever. Copilot, Claude, Cursor, and other AI assistants are generating hundreds of lines per day. Velocity is up.

But so is the rate of a specific category of bugs that barely existed before: code that references things that don't exist.

AI coding tools hallucinate. They generate function calls to functions that aren't in your codebase. They import packages that aren't installed. They reference APIs that don't exist in the version of the library you're using. And unlike a typo or a logic error, these problems often look perfectly correct at first glance.

This guide covers what these patterns look like, why they're dangerous, and how to catch them automatically.

What AI coding tools get wrong

Hallucinated imports

The AI writes an import statement for a package that doesn't exist or isn't installed in your environment:

from cryptoutils import secure_hash     # this package doesn't exist
from flask_permissions import require   # also doesn't exist
import pandas_profiling                 # deprecated, renamed to ydata-profiling

This is more dangerous than it sounds. If the AI generates from cryptoutils import secure_hash and you pip install cryptoutils to make it work, you might be installing a typosquatted package — a malicious package uploaded to PyPI under a plausible-sounding name.

In 2025 alone, PyPI removed hundreds of typosquatted packages targeting common import patterns. An AI-generated import pointing to a nonexistent package is essentially a supply chain attack waiting to happen.

Phantom function calls

The AI generates code that calls functions which don't exist anywhere in your project:

def process_order(order):
    validated = validate_order_schema(order)    # doesn't exist
    enriched = enrich_with_inventory(validated) # doesn't exist
    result = submit_to_payment_gateway(enriched) # doesn't exist
    send_order_confirmation(result)              # doesn't exist
    return result

Each of these looks like a reasonable function name. In a code review, especially a quick one, you might not notice that none of them are defined anywhere. The AI wrote the orchestration layer but forgot to write (or reference) the actual implementations.

Fake API methods

The AI generates calls to methods that don't exist on the object:

import boto3

s3 = boto3.client('s3')
# This method doesn't exist — the real one is list_objects_v2
objects = s3.list_all_objects(Bucket='my-bucket')

Or uses arguments that the function doesn't accept:

from flask import Flask

app = Flask(__name__)
app.run(host='0.0.0.0', port=8080, workers=4)  # Flask doesn't have a 'workers' param

Hardcoded credentials

AI tools frequently inline secrets during development:

client = openai.OpenAI(api_key="sk-proj-1234567890abcdef")

db = psycopg2.connect(
    host="prod-db.internal.company.com",
    password="SuperSecret123!"
)

The AI was trying to be helpful by filling in a complete, working example. But that API key or password is now in your source code, ready to be committed.

Disabled security controls

AI-generated code often takes shortcuts with security:

import requests
response = requests.get(url, verify=False)  # SSL verification disabled

import jwt
decoded = jwt.decode(token, algorithms=["none"])  # JWT verification disabled

app = Flask(__name__)
app.config['DEBUG'] = True  # Debug mode in production

These work during development. They also create vulnerabilities in production.

Why traditional tools miss these

Linters catch syntax, not semantics

Pylint and Flake8 will catch an unused import or an undefined variable. But they won't catch:

A function call to a function defined in another file that was never written
An import of a package that exists on PyPI but isn't in your requirements.txt
An API method that looks valid but doesn't exist in the version you're using

Security scanners focus on known patterns

Bandit and Semgrep look for known vulnerability patterns — eval(), pickle.loads(), SQL injection via string formatting. They don't look for "this function call goes nowhere" or "this import resolves to a package you've never installed."

Code review doesn't scale

A careful code reviewer would catch phantom function calls. But:

AI tools generate code fast — faster than reviewers can review
The generated code often looks plausible and well-structured
Reviewers focus on logic, not on verifying every import and function reference
In a team of 10, you'd need to catch these across dozens of PRs per day

A practical workflow to verify AI-generated Python code

If you want the shortest path to verifying AI-generated Python code:

run a local scan for security and dead code
find where LLM integrations actually exist
score the AI-specific defenses before the PR merges

The commands are straightforward:

pip install skylos
skylos src/ --danger --quality
skylos discover src/
skylos defend src/

That gives you a local-first workflow before you move the same checks into GitHub Actions.

If you want the pull-request workflow next, go to Python Security Scanner for GitHub Actions.

How to detect AI code problems automatically

Skylos discover: find LLM integration points

Skylos includes a discovery module that maps where your codebase integrates with LLM APIs:

skylos discover src/

This scans your code for:

Direct API calls to OpenAI, Anthropic, Cohere, and other LLM providers
Framework integrations (LangChain, LlamaIndex, Haystack)
Prompt construction patterns
RAG pipeline components

Output example:

LLM Integration Discovery
──────────────────────────
Found 3 LLM integration points:

src/chat.py:12      openai.ChatCompletion.create()
src/rag.py:34       RetrievalQA.from_chain_type()
src/prompts.py:8    f-string prompt with user input

This tells you where AI-related code exists in your project so you can focus security review on those areas.

Skylos defend: score AI code security

Once you know where AI integrations live, assess their security posture:

skylos defend src/

Skylos runs 13 security checks across your AI code:

Check	What it detects
Prompt injection	User input concatenated directly into prompts
Input validation	Missing length/content validation on LLM inputs
Output sanitization	LLM responses used without sanitization
RAG context isolation	RAG queries that could leak cross-tenant data
PII filtering	Personal data sent to LLM APIs without scrubbing
Rate limiting	Missing rate limits on LLM endpoints
Cost controls	No token/spend limits on API calls
Logging	LLM interactions not logged for audit

Each integration point gets a defense score out of 100:

Defense Score: 45/100

Missing controls:
  ✗ No input validation on user prompts
  ✗ No output sanitization
  ✗ No rate limiting on /chat endpoint
  ✓ API key stored in environment variable
  ✓ Logging present

Skylos standard analysis: dead code + security

For the broader AI-generated code problems (phantom calls, hallucinated imports, hardcoded secrets), Skylos's standard analysis covers them:

skylos src/ --danger

This catches:

Hardcoded credentials (SKY-D201):

src/config.py:7  [HIGH] Hardcoded credential: API_KEY = "sk-proj-..."

Command injection from user input (SKY-D212):

src/tools.py:23  [HIGH] Command injection: user input flows to subprocess.call()

Weak cryptography (SKY-D214):

src/auth.py:15  [MEDIUM] Weak hash: hashlib.md5() used for password hashing

Disabled SSL verification (SKY-D220):

src/api.py:9  [MEDIUM] SSL verification disabled: verify=False

Unused functions (dead code from abandoned AI drafts):

src/utils.py:45   unused function  generate_report_v1
src/utils.py:89   unused function  _format_legacy_response
src/helpers.py:12  unused import   from pandas_profiling import ProfileReport

Combining all three

For a comprehensive scan of a codebase that uses AI tools:

# Find where LLM integrations exist
skylos discover src/

# Score their security posture
skylos defend src/

# Scan everything for dead code + security + quality
skylos src/ --danger --quality

Running this in CI

Catch AI code problems on every PR

skylos cicd init

This generates a GitHub Actions workflow that:

Scans every PR for security vulnerabilities, dead code, and quality issues
Posts inline comments on findings directly in the PR
Fails the pipeline if critical findings are present

For AI-specific scanning, add defend checks to your workflow:

name: Skylos AI Code Check
on:
  pull_request:
    branches: [main]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install skylos
      - run: skylos . --danger --quality --github
      - run: skylos defend . --fail-on high

The --fail-on high flag blocks the PR if any high-severity AI security findings are present.

Pre-commit hook

Catch problems before they even reach the PR:

repos:
  - repo: https://github.com/duriantaco/skylos
    rev: v3.5.10
    hooks:
      - id: skylos
        args: ['--danger']

Real patterns we've seen

The "helpful" AI that imports malware

An AI assistant suggested:

from python_utils import encryption

The developer installed python-utils from PyPI to make the import work. The actual package had nothing to do with encryption — it was a generic utilities package. But a typosquatted version (python_utils with an underscore) was also on PyPI and contained a reverse shell.

The orchestrator with no implementation

AI generated a clean service layer:

class OrderService:
    def create_order(self, data):
        validated = OrderValidator.validate(data)
        inventory = InventoryService.reserve(validated.items)
        payment = PaymentGateway.charge(validated.total)
        notification = NotificationService.send_confirmation(payment)
        return Order.create(validated, inventory, payment, notification)

None of those classes existed. The code passed review because the logic was sound. It crashed at runtime.

The copy-paste credentials

AI auto-completed a configuration block:

STRIPE_SECRET_KEY = "sk_live_51H7b..."
OPENAI_API_KEY = "sk-proj-abc123..."
DATABASE_URL = "postgresql://admin:password@prod-db:5432/main"

The developer committed it, pushed, and the keys were scraped from the public GitHub repo within minutes.

What this means for your team

If your team uses AI coding tools (and most teams do now), you need automated scanning that catches AI-specific failure modes:

Hallucinated imports — flag imports of packages not in your requirements
Phantom function calls — flag calls to functions not defined in your codebase
Hardcoded secrets — flag credentials that AI tools inline
Disabled security — flag verify=False, debug=True, algorithms=['none']
Dead code accumulation — flag the abandoned drafts that AI tools leave behind

Traditional linters and security scanners catch some of these. None of them catch all of them. That's the gap Skylos fills.

Get started

pip install skylos

# Scan for security + dead code
skylos src/ --danger

# Find LLM integration points
skylos discover src/

# Score AI code security posture
skylos defend src/

If you want to compare scanners first, read Bandit vs Skylos or Best Python SAST Tools in 2026.

Skylos is open source. View on GitHub | Docs | Install

Frequently asked questions

What is a hallucinated import?

A hallucinated import is an import statement for a package or symbol that does not actually exist in your environment or codebase.

Why is AI-generated Python code risky even when tests pass?

Tests cover only the paths they exercise. AI-generated code can still hide dead paths, insecure defaults, fake APIs, or missing security controls outside the tested path.

Should I review AI-generated code differently from human-written code?

Yes. In addition to normal review, explicitly check whether imports, function calls, libraries, and security controls are real and actually connected.

Keep the workflow concrete, then pick the right guardrails

All use cases

Use case

How to Verify AI-Generated Python Code and Catch Hallucinated Imports

The guide matters only if it maps to code you already own

How to Verify AI-Generated Python Code and Catch Hallucinated Imports

What AI coding tools get wrong

Hallucinated imports

Phantom function calls

Fake API methods

Hardcoded credentials

Disabled security controls

Why traditional tools miss these

Linters catch syntax, not semantics

Security scanners focus on known patterns

Code review doesn't scale

A practical workflow to verify AI-generated Python code

How to detect AI code problems automatically

Skylos discover: find LLM integration points

Skylos defend: score AI code security

Skylos standard analysis: dead code + security

Combining all three

Running this in CI

Catch AI code problems on every PR

Pre-commit hook

Real patterns we've seen

The "helpful" AI that imports malware

The orchestrator with no implementation

The copy-paste credentials

What this means for your team

Get started

Frequently asked questions

Keep the workflow concrete, then pick the right guardrails

How to Secure an MCP Server Before You Trust It With Your Code

How to Review Claude Code Output for Python Security Regressions

Bandit vs CodeQL vs Semgrep for Python Security Scanning

Bandit vs Skylos: Which Python Security Scanner Should You Use?

How to Verify AI-Generated Python Code and Catch Hallucinated Imports

What AI coding tools get wrong

Hallucinated imports

Phantom function calls

Fake API methods

Hardcoded credentials

Disabled security controls

Why traditional tools miss these

Linters catch syntax, not semantics

Security scanners focus on known patterns

Code review doesn't scale

A practical workflow to verify AI-generated Python code

How to detect AI code problems automatically

Skylos discover: find LLM integration points

Skylos defend: score AI code security

Skylos standard analysis: dead code + security

Combining all three

Running this in CI

Catch AI code problems on every PR

Pre-commit hook

Real patterns we've seen

The "helpful" AI that imports malware

The orchestrator with no implementation

The copy-paste credentials

What this means for your team

Get started

Related

Frequently asked questions

Keep the workflow concrete, then pick the right guardrails

How to Secure an MCP Server Before You Trust It With Your Code

How to Review Claude Code Output for Python Security Regressions

Bandit vs CodeQL vs Semgrep for Python Security Scanning

Bandit vs Skylos: Which Python Security Scanner Should You Use?