securitysastappsecaidevsecops

AI-generated code is shipping vulnerabilities

January 23, 2026

•

9 min read

AI-generated code is shipping vulnerabilities

AI coding assistants are everywhere now. They make teams faster, more efficient, and they allow teams to ship features quicker. However, they also introduce a new kind of risk: AI-generated code often looks correct, but is it really?

This pattern is so common that large evaluations have found a huge fraction of AI-generated code failing even basic vulnerability categories like injection and insecure defaults. Veracode's 2025 report evaluated 100+ LLMs and found many produced insecure outputs.

Source: Veracode 2025 GenAI Code Security Report

Summary

This post covers the following key points:

Common vulnerability patterns include injection, broken access control, weak auth, and unsafe defaults
AI introduces unique failure modes such as hallucinated dependencies, phantom helper functions, and feeling of "good" security
The solution is building a secure AI coding pipeline with high-signal SAST, agentic verification, and PR-time guardrails

To be very clear, this is not an "AI is bad" post. It is a "treat AI output like untrusted input" post.

Why AI-generated code fails security checks

Humans write insecure code too. But AI adds a specific flavor of insecurity.

The model is trained to be helpful, not safe

LLMs learn from massive code corpora. They absorb patterns, including insecure ones, and reproduce them confidently.

Research shows AI assistants can generate code containing security weaknesses at non-trivial rates. One peer-reviewed analysis of Copilot-generated code identified many snippets with security weaknesses across languages.

These same "vibe-coded" security flaws, are then used to fine-tune/train new models.

Source: ACM study (Fu et al.)

Fake-good security (security theater)

A dangerous pattern is when AI produces code that looks secure:

# This looks safe but the functions may not even exist or it does nothing
user_input = sanitize_input(request.data)
if validate_user(user_id):
    check_permissions(user_id, resource)
    return get_resource(resource)

These functions are often missing, incomplete, never called, or not connected to enforcement. The code passes visual inspection and still ships a vulnerability.

Hallucinated dependencies

A newer failure mode is library hallucination: importing packages that do not exist, calling functions that are not real, or using APIs that have deprecated eons ago.

# This import may not exist
from security_utils_new import auto_sanitize_sql

# The model hallucinated this package
import flask_secured  # This does not exist in pypi

This is not just reliability pain. It becomes a security risk when teams patch around it quickly and accidentally disable important safety checks.

Source: Library Hallucinations in LLMs (Twist et al.)

The most common vibe-coded vulnerabilities

These are the patterns you will see repeatedly when teams ship AI-generated code under time pressure.

Injection vulnerabilities

Classic issue: raw user input flows into dangerous sinks. Even modern teams still get hit by injection because AI will often pick the code that works, not the safest.

SQL Injection Example:

# Vulnerable: string concat
def get_user(user_id):
    query = f"SELECT * FROM users WHERE id = '{user_id}'"
    return db.execute(query)

# Secure: parameterized query
def get_user(user_id):
    query = "SELECT * FROM users WHERE id = %s"
    return db.execute(query, (user_id,))

Command Injection Example:

# Vulnerable
def convert_file(filename):
    os.system(f"convert {filename} output.pdf")

# Secure
def convert_file(filename):
    subprocess.run(["convert", filename, "output.pdf"], check=True)

Template Injection Example:

# Vulnerable: user input in template
@app.route("/hi")
def hi():
    template = request.args.get("template", "Hi {{ name }}")
    return render_template_string(template, name="User")

# Secure: **never** use user input as template
@app.route("/hi")
def hi():
    name = request.args.get("name", "User")
    return render_template("hi.html", name=name)

Broken access control

This is one of the biggest real-world risk categories. A lot of working code forgets to enforce who is allowed to do what, especially in multi-step workflows.

AI will happily generate endpoints like these and omit permission checks because the prompt did not demand them:

// Vulnerable: no auth check
app.get("/invoice/:id", async (req, res) => {
  const invoice = await Invoice.findById(req.params.id)
  res.json(invoice)
})

// Secure: verify ownership
app.get("/invoice/:id", async (req, res) => {
  const invoice = await Invoice.findById(req.params.id)
  
  if (!invoice) {
    return res.status(404).json({ error: "Not found" })
  }
  
  if (invoice.userId !== req.user.id) {
    return res.status(403).json({ error: "Forbidden" })
  }
  
  res.json(invoice)
})

Multi-step workflow example:

# Vulnerable: checks permission only once and not on state change
class OrderWorkflow:
    def __init__(self, order_id, user):
        self.order = Order.get(order_id)
        if self.order.user_id != user.id:
            raise PermissionError()
    
    def cancel(self):
        # Missing: re-check if user can cancel at this state
        self.order.status = "cancelled"
        self.order.save()

# Secure: check permissions at each state transition
class OrderWorkflow:
    def __init__(self, order_id, user):
        self.order = Order.get(order_id)
        self.user = user
        self._verify_access()
    
    def _verify_access(self):
        if self.order.user_id != self.user.id:
            raise PermissionError("Not your order")
    
    def cancel(self):
        self._verify_access()
        if self.order.status not in ["pending", "confirmed"]:
            raise InvalidStateError("Cannot cancel at this stage")
        self.order.status = "cancelled"
        self.order.save()

OWASP consistently highlights broken access control as a major web risk category in modern apps.

Unsafe defaults

AI commonly defaults to configurations that are convenient but insecure:

# Vulnerable defaults in Flask
app = Flask(__name__)
app.config["DEBUG"] = True  # Exposes debugger in production
app.config["SECRET_KEY"] = "dev"  # Predictable secret

# Secure configuration
import os

app = Flask(__name__)
app.config["DEBUG"] = os.environ.get("FLASK_DEBUG", "false").lower() == "true"
app.config["SECRET_KEY"] = os.environ.get("SECRET_KEY")

if not app.config["SECRET_KEY"]:
    raise RuntimeError("SECRET_KEY environment variable required")

CORS misconfiguration:

// Vulnerable: allows any origin
app.use(cors({
  origin: "*",
  credentials: true  // This combination is dangerous
}))

// Secure: explicit allowlist
const allowedOrigins = [
  "https://app.example.com",
  "https://admin.example.com"
]

app.use(cors({
  origin: (origin, callback) => {
    if (!origin || allowedOrigins.includes(origin)) {
      callback(null, true)
    } else {
      callback(new Error("Not allowed by CORS"))
    }
  },
  credentials: true
}))

Weak token generation:

# Vulnerable: predictable tokens
import random
import string

def generate_token():
    return "".join(random.choices(string.ascii_letters, k=32))

# Secure: cryptographically random tokens
import secrets

def generate_token():
    return secrets.token_urlsafe(32)

Tool misuse in agentic workflows

As agents gain tool access (filesystem, git, IDE actions), security failures do not just live in the code. They live in the automation layer.

# Vulnerable: agent with unrestricted file access
class CodeAgent:
    def write_file(self, path, content):
        with open(path, "w") as f:
            f.write(content)

# Secure: sandboxed file operations
class CodeAgent:
    ALLOWED_PATHS = ["/workspace/src", "/workspace/tests"]
    
    def write_file(self, path, content):
        abs_path = os.path.abspath(path)
        
        if not any(abs_path.startswith(p) for p in self.ALLOWED_PATHS):
            raise SecurityError(f"Path not allowed: {path}")
        
        if ".." in path:
            raise SecurityError("Path traversal detected")
        
        with open(abs_path, "w") as f:
            f.write(content)

Source: OWASP GenAI Agentic Guidance

What actually works: secure AI coding as a pipeline

If your organization is using AI to write code, here is the correct mindset: AI output is untrusted until verified.

That means your workflow should behave like a security pipeline, not a suggestion box.

Step 1: Enforce high-signal SAST

AI code needs modern SAST, not grep-based security scanning.

Look for tools that provide:

Dataflow and taint tracking
Sanitizer detection
Framework-aware modeling
Reachability checks (is this code even callable?)

The goal is not to scan more. The goal is to be right more often.

Step 2: Add AI-specific static checks

Traditional SAST misses many AI failure modes. Add targeted heuristics that catch vibe-coded patterns with high precision.

Phantom security calls detector:

# Static analysis rule: flag undefined security functions
import ast

SECURITY_PATTERNS = [
    "sanitize_", "validate_", "check_permission", 
    "auth_", "verify_", "secure_"
]

def find_phantom_security_calls(source_code):
    tree = ast.parse(source_code)
    defined_functions = set()
    called_functions = []
    
    for node in ast.walk(tree):
        if isinstance(node, ast.FunctionDef):
            defined_functions.add(node.name)
        elif isinstance(node, ast.Call):
            if isinstance(node.func, ast.Name):
                called_functions.append(node.func.id)
    
    findings = []
    for func in called_functions:
        if any(func.startswith(p) for p in SECURITY_PATTERNS):
            if func not in defined_functions:
                findings.append(f"Undefined security function: {func}")
    
    return findings

Security TODO detector:

# Flag security TODOs that shouldn't ship
import re

def find_security_todos(code):
    pattern = r"#.*(TODO|FIXME).*(auth|secur|permission)"
    return [f"Line {i}: {line.strip()}" 
            for i, line in enumerate(code.split("\n"), 1)
            if re.search(pattern, line, re.IGNORECASE)]

Hallucinated dependency detector:

# Compare imports against lockfile
import ast
import json
import sys

def find_hallucinated_imports(source_code, lockfile_path):
    tree = ast.parse(source_code)
    
    with open(lockfile_path) as f:
        if lockfile_path.endswith(".json"):
            lockfile = json.load(f)
            installed = set(lockfile.get("packages", {}).keys())
        else:
            # requirements.txt format
            installed = set()
            for line in f:
                if line.strip() and not line.startswith("#"):
                    pkg = line.split("==")[0].split(">=")[0].strip()
                    installed.add(pkg)
    
    findings = []
    for node in ast.walk(tree):
        if isinstance(node, ast.Import):
            for alias in node.names:
                pkg = alias.name.split(".")[0]
                if pkg not in installed and pkg not in sys.stdlib_module_names:
                    findings.append(f"Potentially hallucinated import: {pkg}")
        elif isinstance(node, ast.ImportFrom):
            if node.module:
                pkg = node.module.split(".")[0]
                if pkg not in installed and pkg not in sys.stdlib_module_names:
                    findings.append(f"Potentially hallucinated import: {pkg}")
    
    return findings

Step 3: Agentic verification (This should be built well if not it's gonna be the same problem again)

Static analysis is necessary but not sufficient. AI-generated code should be verified with tests that prove security properties.

Auth test example:

import pytest

class TestInvoiceAuthorization:
    def test_user_can_access_own_invoice(self, client, user, user_invoice):
        response = client.get(
            f"/invoice/{user_invoice.id}",
            headers={"Authorization": f"Bearer {user.token}"}
        )
        assert response.status_code == 200
    
    def test_user_cannot_access_other_invoice(self, client, user, other_invoice):
        response = client.get(
            f"/invoice/{other_invoice.id}",
            headers={"Authorization": f"Bearer {user.token}"}
        )
        assert response.status_code == 403
    
    def test_unauth_cannot_access_invoice(self, client, user_invoice):
        response = client.get(f"/invoice/{user_invoice.id}")
        assert response.status_code == 401
    
    def test_invalid_invoice_id_returns_404(self, client, user):
        response = client.get(
            "/invoice/nonexistent-id",
            headers={"Authorization": f"Bearer {user.token}"}
        )
        assert response.status_code == 404

Step 4: PR gating that developers will not disable

If you want adoption, gate on:

New findings only
High confidence
High severity
Actionable fixes

Do not block merges on legacy noise. That is how scanners get turned off.

name: Security Gate
on: [pull_request]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run SAST on changed files
        run: |
          git diff --name-only origin/main...HEAD | \
            xargs semgrep --config=auto --severity=ERROR

Step 5: Suppressions with reasons

False positives never hit zero. Suppressions must be reviewable, scoped, explained, and auditable.

# .security-suppressions.yaml
suppressions:
  - rule: "SQLI-001"
    file: "billing/service.py"
    line: 142
    reason: "Parameterized query with int() constraint; no attacker-controlled SQL"
    owner: "appsec"
    createdAt: "2026-01-23"
    expiresAt: "2026-07-23"
  
  - rule: "HARDCODED-SECRET"
    file: "tests/fixtures.py"
    line: 15
    reason: "Test-only mock API key, not used in production"
    owner: "backend-team"
    createdAt: "2026-01-20"

The real takeaway

AI code is not worse than human code. It is faster to produce, which means it can outpace your ability to review and secure it.

That creates a new reality: the bottleneck is no longer writing code. The bottleneck is verifying code safely. Teams that succeed in 2026 will build three things: high-signal security scanning, AI-aware static checks, and verification workflows that scale.

Because vibe coding is not going away.

Back to all posts