Research • AI Code Securitysecurityai codepython

Why AI-Generated Python Code Is Insecure in 2026 (And What Static Analysis Actually Catches)

Veracode's late-2025 GenAI Code Security Report and its Spring 2026 update keep landing on the same number: around 45 percent of AI-generated code ships with security flaws. Python was around 38 percent in Veracode's October 2025 language breakdown, while Python, C#, and JavaScript clustered in the 38 to 45 percent band. Here is what is actually breaking, why review tools miss it, and the static analysis gates that stop it before merge.

Skylos team• Product engineering and research

May 9, 2026

Updated May 10, 2026

11 min read

About this page

The AI-generated code vulnerability rate has become a real number, not a vibe. Veracode's late-2025 GenAI Code Security Report and its Spring 2026 update both place the rate around 45 percent across 100+ models and 80 coding tasks, with Python around 38 percent in the October 2025 language breakdown and Python, C#, and JavaScript clustered in the 38 to 45 percent band. This piece consolidates that research, names the failure patterns, and points to the specific Python static checks that catch them.

4 notes

Used Veracode's GenAI Code Security Report (October 2025) and its Spring 2026 update as the primary source for the 45 percent overall flaw rate, Python's roughly 38 percent language result, and the 38 to 45 percent Python/C#/JavaScript band.
Used the USENIX Security 2025 paper by Spracklen et al. (576,000 samples across 16 models) as the primary source for the overall package-hallucination rate and its language and model-family slices.
Cross-referenced common Python failure patterns (missing input validation, weak crypto, broken authorization, over-permissive defaults, hallucinated dependencies) with the categories Python SAST tools cover.
Mapped each failure pattern to the deterministic check that would have caught it locally or in CI before merge.

Quick answer

Veracode's late-2025 and Spring 2026 GenAI Code Security Reports place the AI-generated code vulnerability rate around 45 percent across 100+ models, with Python around 38 percent in the October 2025 language breakdown.
The failures are not exotic. They are the same OWASP-shaped misses that humans make, just at a much higher volume: missing validation, weak crypto, broken auth, over-permissive defaults, and hallucinated imports.
AI review assistants flag style and obvious bugs, but they are non-deterministic and miss systemic regressions. A deterministic Python static gate is what closes the loop.
The cheapest place to catch this is local plus CI, before the diff is merged, not after a security ticket lands two sprints later.

Step 1
Treat AI-generated code as untrusted by default
Adopt the same posture you would for an external contributor's first pull request. The model is fast, not accountable.
Step 2
Run a deterministic Python static gate on every diff
Use a SAST tool that runs locally and in CI, with diff-aware scanning so signal stays high on AI PRs.
Step 3
Block on the failure classes that AI keeps reproducing
Make missing input validation, weak crypto, broken authorization, over-permissive defaults, and removed security middleware merge-blocking, not advisory.
Step 4
Review the dead code
AI agents often leave behind orphaned routes, unreachable error paths, and unused validators. Dead code is a security signal, not just hygiene.

Review AI-generated code

Catch AI-introduced regressions before they land

Run Skylos locally or in CI to surface hallucinated imports, removed auth checks, secrets, and risky defaults in AI-generated diffs.

Run Skylos on your repo See the AI PR workflow

Navigation

Jump to a section

01The numbers in one place 02What is actually breaking in AI-generated Python 031. Missing or weakened input validation 042. Weak or default cryptography 053. Broken or removed authorization 064. Over-permissive defaults and configurations 075. Hallucinated and risky imports 08Why AI code review alone does not close the gap 09What a Python merge gate for AI-generated code should look like 10A practical first week 11The bottom line

Why AI-Generated Python Code Is Insecure in 2026

If you have been reading recent industry reports on AI-generated code, you have probably noticed they all rhyme.

Different methodologies. Different vendors. Different sample sizes. But the headline number keeps landing in the same place:

Around 45 percent of AI-generated code ships with at least one security flaw, and Python was around 38 percent in Veracode's October 2025 language breakdown.

That is not a marketing number. It is what fell out of Veracode's October 2025 GenAI Code Security Report, which tested 100-plus large language models on 80 coding tasks and ran the output through real static analysis. Veracode's Spring 2026 update, with an expanded model set, found the rate effectively unchanged.

This guide does three things:

Summarizes the empirical picture as it stands in May 2026.
Names the specific Python failure patterns driving the rate.
Maps each pattern to the deterministic static analysis gate that catches it before merge.

If you want a single sentence to take away, it is this:

The vulnerabilities AI ships in Python are not exotic. They are the boring ones, at much higher volume, often hidden inside diffs that look polished.

The numbers in one place

Across the primary research and vendor reports published over the last twelve months, the picture looks like this:

Around 45 percent of AI-generated code samples contain a security flaw, measured by Veracode across more than 100 large language models and 80 coding tasks. The Spring 2026 update, with a larger model set, did not move the rate.
Python landed around 38 percent in the same Veracode language breakdown. Python, C#, and JavaScript clustered in the 38 to 45 percent range, while Java skewed substantially higher.
AI-generated code introduces materially more privilege-escalation risk than human-written code. Apiiro's September 2025 review of AI coding assistants found roughly 322 percent more privilege-escalation paths in AI-generated changes versus human baselines, on top of the broader vulnerability rate.
Slopsquatting is no longer theoretical. The USENIX Security 2025 paper by Spracklen et al. tested 16 models across 576,000 code samples and reported an overall package-hallucination rate of about 19.7 percent, with Python at 15.8 percent, JavaScript at 21.3 percent, commercial models at 5.2 percent, and open-source models at 21.7 percent. Hallucinated names are already being registered on PyPI and npm and pulled into real repos through AI-suggested install lines.

The exact percentages will keep moving as models, frameworks, and harnesses change. The shape of the problem will not. AI is producing more code than humans can read, and a meaningful fraction of it is shipping insecure.

This is why the conversation has shifted from should we use AI coding assistants? to what does the merge gate look like when half of the diffs are model output?

What is actually breaking in AI-generated Python

The interesting part of the data is not the headline percentage. It is which classes of bug keep showing up.

Five patterns show up repeatedly.

1. Missing or weakened input validation

This is one of the most common patterns. The model writes a route, a handler, or a parser, and it accepts whatever shape the caller provides.

A typical AI-written FastAPI handler will happily take a dict, pass it into a database call, and never validate that the field types, lengths, or character sets are what the rest of the system expects.

What is happening under the hood: the model is optimizing for runnable, plausible code. Validation feels like ceremony. So it drops it, or it adds a stub that does not actually enforce anything.

The static signal is concrete:

Untyped or Any-typed inputs reaching sinks.
Pydantic models configured with extra='allow', or models that silently ignore unexpected fields where strict validation is required.
request.json() flowing straight into ORM filters or template strings.

A diff-aware Python static check should treat any new sink that consumes unvalidated request data as merge-blocking, not advisory.

2. Weak or default cryptography

When asked to "hash a password," "generate a token," or "encrypt this," AI-generated Python reaches for whatever is fastest to write. Often that means:

hashlib.md5 or sha1 for passwords or signatures.
random.random() or random.choice() for secret material instead of secrets.
Crypto.Cipher.AES in MODE_ECB because it is the simplest to demonstrate.
Hardcoded IVs or salts that look "deterministic" and "easy to test."

These are not new bugs. They are 20-year-old bugs at higher frequency, because the model has seen all of them in training data and they all run.

Static checks for weak crypto and insecure randomness in Python are mature. The fix is not better detection. It is making those checks block the merge, not produce a comment that someone agrees with later.

3. Broken or removed authorization

This is the one that tends to show up in incident postmortems.

A developer asks an assistant to "refactor this endpoint" or "make this faster." The assistant returns a clean, readable handler. It also quietly removes the @requires_auth decorator, drops the current_user check, or rewrites the permission logic in a way that always returns True.

It looks like a routine refactor. The diff is small. The logic still works for the happy path. The endpoint is now public.

This is not catchable by AI review alone, because the AI review tool is reading the new state, not comparing it against the previous security posture. The signal you actually need is:

Removed authorization decorator on a route that previously had one.
Removed permission check inside a handler.
A new public route added without going through your standard auth layer.

A diff-aware Python static gate is the right tool here, because the bug only exists relative to the previous version of the file. That is exactly the shape Skylos was built for.

4. Over-permissive defaults and configurations

This is the category Apiiro's September 2025 review of AI coding assistants highlighted with the 322-percent privilege-escalation finding. In Python projects, the same pattern shows up as:

Flask or FastAPI CORS set to * "for development."
Django DEBUG=True or ALLOWED_HOSTS=['*'] left in committed config.
IAM policies in generated Terraform or boto3 helpers that grant *:* because the model "did not know which actions were needed."
File system writes into /tmp with 0o777 permissions because that "always works."
AWS S3 or GCS resources created with public access blocks disabled, ACLs re-enabled, or overly broad bucket and IAM policies.

These are obvious in isolation. They get past review because they are wrapped inside a much larger AI-generated feature PR, and the reviewer is already on their tenth diff of the morning.

Static rules for these are well understood. The discipline is making them part of the gate.

5. Hallucinated and risky imports

Slopsquatting is the AI version of typosquatting. The model invents a plausible-sounding package name, the developer accepts the suggestion, and the install line either fails or, increasingly, succeeds because someone has already registered the hallucinated name on PyPI.

Concrete examples documented over the last two years:

The huggingface-cli proof-of-concept by Bar Lanyado at Lasso Security (registered in late 2023 to early 2024 against the real huggingface_hub[cli] install line). The hallucinated form picked up more than 30,000 authentic downloads over three months purely from AI-suggested install instructions.
Cross-language hallucinations bleeding into AI agent skill libraries. In January 2026, Aikido Security's Charlie Eriksen found the hallucinated npm package react-codeshift referenced in roughly 237 real GitHub repositories, then registered the package and observed real download attempts from AI-suggested commands.
Supply chain attacks on AI-stack packages themselves. In March 2026, litellm versions 1.82.7 and 1.82.8 (published March 24 by the threat actor known as "TeamPCP" through a poisoned Trivy GitHub Action) shipped credential-harvesting malware. Version 1.82.8 also used a malicious .pth file to run during Python startup and target SSH keys, AWS, GCP, and Azure cloud credentials, and Kubernetes configs.

The defense layer here is twofold: a deterministic check on the imports the diff actually adds, and a CI step that resolves and pins them before they reach the merge queue. AI review tools cannot reliably catch hallucinated dependencies because, to the model, the import looks fine.

Why AI code review alone does not close the gap

AI review assistants like GitHub Copilot code review and the new generation of Claude-based reviewers are real and useful. They catch obvious issues, they accelerate the human reviewer, and they raise the floor on PR feedback.

But they are not a security gate, and their own documentation says so.

Three reasons they leave the ~45 percent rate roughly where it is:

They are non-deterministic. The same diff reviewed twice can produce different feedback. That is fine for prose. It is not fine for a merge gate.
They are biased toward the visible diff. Removed code is much harder to flag than added code. Many of the worst AI-introduced regressions are deletions: removed validators, removed decorators, removed middleware.
They do not see the system. A snippet-level review does not know that this route is the only one without auth, or that this query is the only one without parameter binding, or that this dependency was never previously imported anywhere in the repo.

You want both. AI review for breadth and reviewer assistance. Deterministic static analysis for the things you cannot afford to merge.

What a Python merge gate for AI-generated code should look like

For Python repos in 2026, the working pattern is:

Local pre-commit run. Catch the obvious failures before they ever become a PR. Insecure crypto, debug flags, hardcoded secrets, hallucinated imports.
CI scan on every PR, diff-aware. Run the same checks against the actual changed lines and the changed-file context, so signal stays high on large AI diffs.
Block on a small, opinionated set of categories. Missing validation on new sinks, removed auth or permission checks, weak crypto, over-permissive configs, unknown or newly added imports. Other findings can be advisory.
Treat dead code as a security signal, not just hygiene. AI agents leave behind orphaned routes, unreachable error paths, and unused validators. Each one is either a removed feature or a removed guard, and you want to know which.
Make the gate part of the merge contract. Status check required, no manual override unless logged.

This is roughly the workflow Skylos is designed for: a Python-first, diff-aware, deterministic gate that runs locally and in CI, with explicit attention to dead code and AI-introduced regressions. The point is not the tool. The point is the gate.

A practical first week

If you want to act on this without rewriting your toolchain, the first week looks like:

Day 1. Pick one repo with the most AI-generated PR volume. Measure how many of last month's PRs were AI-assisted.
Day 2. Run a Python SAST and dead code scan on the current main branch. Capture the baseline.
Day 3. Wire the same scan into CI as advisory only. Watch one week of PRs.
Day 4 to 6. Pick the three failure classes that show up most. Promote those to merge-blocking. Leave the rest advisory for now.
Day 7. Review with the team. Adjust thresholds. Decide which checks graduate to blocking next sprint.

After two sprints, you will know exactly which AI-generated patterns your team is producing and which gates actually pay rent.

The bottom line

AI is not making Python developers worse. It is making the rate of plausibly-correct, subtly-insecure code much higher.

The roughly 45 percent vulnerability rate is not a permanent state of the world. It is the state of the world when teams treat AI output the same way they treat hand-written senior-engineer output. That assumption is the bug.

Treat AI-generated Python as untrusted by default. Put a deterministic gate in front of merge. Make the small, boring failure classes block, not warn. The numbers will move.

If you want a Python-first, diff-aware place to start, Skylos Cloud is built for exactly this workflow. Or jump straight to the AI code review security PR checklist and the secure MCP server guide for the adjacent surfaces.

Frequently asked questions

Is AI-generated Python code really insecure about half the time?

Veracode's late-2025 GenAI Code Security Report and its Spring 2026 update both put the overall AI-generated code flaw rate around 45 percent. Python was around 38 percent in the October 2025 language breakdown, which is high enough that treating AI output as untrusted by default is the only sane policy.

Why does AI-generated Python code fail security checks more often than hand-written code?

Models optimize for plausible, runnable code. They tend to drop input validation, prefer easy crypto choices, leave debug or verbose error paths in place, and reach for the first import that fits the shape of the problem. None of those are bugs from the model's point of view, but all of them are real vulnerabilities at scale.

Don't AI code review tools catch this?

AI review tools help, but they are non-deterministic and explicitly do not guarantee coverage. They are good at surface-level review and bad at silent regressions like a removed auth decorator or a quietly disabled validator. You still need a deterministic static gate.

Where does Skylos fit in this picture?

Skylos runs as a deterministic Python SAST and dead code gate locally and in CI. It catches the patterns AI-generated code keeps producing: missing checks, removed guards, dead-but-still-imported code, and risky imports. It is meant to sit between the AI review pass and merge, not replace either.

Review AI-generated code

Catch AI-introduced regressions before they land

Run Skylos locally or in CI to surface hallucinated imports, removed auth checks, secrets, and risky defaults in AI-generated diffs.

Run Skylos on your repo See the AI PR workflow

Continue exploring

Why AI-Generated Python Code Is Insecure in 2026 (And What Static Analysis Actually Catches)

Catch AI-introduced regressions before they land

Why AI-Generated Python Code Is Insecure in 2026

The numbers in one place

What is actually breaking in AI-generated Python

1. Missing or weakened input validation

2. Weak or default cryptography

3. Broken or removed authorization

4. Over-permissive defaults and configurations

5. Hallucinated and risky imports

Why AI code review alone does not close the gap

What a Python merge gate for AI-generated code should look like

A practical first week

The bottom line

Frequently asked questions

Catch AI-introduced regressions before they land

Related reading

Slopsquatting in Python: What 205,474 Hallucinated Package Names Mean for Your Supply Chain

AI-Generated Python Code Is Shipping Vulnerabilities (2026 Data)

AI Coding Agent Security Checklist: 12 Controls Before Agents Open PRs