Guide • AI Code Securitysecurityai codecode-review

AI Coding Agent Security Checklist: 12 Controls Before Agents Open PRs

AI coding agents can edit files, run commands, add dependencies, call tools, and open pull requests. Use this security checklist before you let Claude Code, Cursor, Codex, Copilot, Devin, or other agents work in production repositories.

Skylos team• Product engineering and research

May 16, 2026

15 min read

About this page

The security problem has shifted from generated snippets to tool-using agents. Modern coding agents can read repo context, edit multiple files, run commands, install dependencies, follow project instructions, and open pull requests. That makes the agent workflow part of the software supply chain, not just a developer convenience.

3 notes

Reviewed current public research and vendor analysis on AI coding agents, including SecureVibeBench, DepDec-Bench, Snyk ToxicSkills, and Mindgard's trust-persistence writeup.
Mapped those risks to controls a small engineering team can actually enforce: repo trust, tool permissions, dependency review, diff-aware scanning, CI gates, and human approval.
Kept the checklist focused on code and PR workflows where Skylos fits: local scans, AI regression detection, dead code, dependency discipline, and CI guardrails.

Quick answer

AI coding agent security is different from AI autocomplete security because agents can take actions: edit files, run commands, install packages, call tools, and produce PRs.
The highest-risk controls are not exotic: restrict repo trust, review agent instructions and tool config, gate dependency changes, scan diffs, and block removed auth or validation before merge.
Do not rely on the agent that wrote the code to be the only reviewer of that code. Use deterministic local and CI checks before human approval.

Step 1
Limit what the agent can touch
Scope the agent to the repo and task. Avoid giving it broad shell, credential, cloud, or production access by default.
Step 2
Review agent-controlled config
Treat files such as AGENTS.md, CLAUDE.md, MCP config, hooks, package scripts, and CI workflows as executable trust surfaces.
Step 3
Gate the pull request
Run deterministic scans, tests, dependency checks, and human review before any agent-authored change reaches main.

Review AI-generated code

Catch AI-introduced regressions before they land

Run Skylos locally or in CI to surface hallucinated imports, removed auth checks, secrets, and risky defaults in AI-generated diffs.

Run Skylos on your repo See the AI PR workflow

Navigation

Jump to a section

AI Coding Agent Security Checklist

AI coding agents are not just autocomplete anymore.

They can read your repository, edit several files, run commands, install packages, update tests, call MCP tools, and open pull requests. Some teams already let them handle small fixes end to end.

That is useful. It is also a different security model.

When an agent only suggests a function, the main question is:

Is this generated code safe?

When an agent can take actions inside a repository, the question becomes:

What can this tool chain do if the prompt is wrong, the repo instructions are hostile, a dependency is malicious, or the generated PR removes a security control?

This guide gives you a practical AI coding agent security checklist for teams using Claude Code, Cursor, Codex, Copilot, Devin, Windsurf, Gemini CLI, OpenHands, or any internal coding agent.

If you want the short version, here it is:

Treat every AI coding agent like an untrusted contributor with a fast keyboard and tool access. Let it help, but verify every security-relevant action before merge.

Why this matters now

The risk has moved from snippets to workflows.

Research keeps pointing in the same direction:

The SecureVibeBench preprint evaluates code agents on realistic multi-file tasks based on OSS-Fuzz vulnerability-introducing scenarios. The authors report that even the best-performing agent produced correct and secure solutions on only 23.8% of tasks.
The DepDec-Bench preprint looks at dependency decision-making and reports that AI agents selected PR-time known-vulnerable dependency versions in 2.46% of studied agent-authored dependency changes, with a net-negative security impact compared with human-authored changes.
Snyk's ToxicSkills research scanned 3,984 agent skills and reported critical issues in 13.4% of them and at least one security flaw in 36.82%.
Mindgard's trust-persistence writeup argues that coding-agent trust decisions can become stale when project-controlled configuration changes after a folder has already been approved.

You do not need to agree with every methodology to see the pattern.

AI agents are now part of the development supply chain. They consume repo instructions, execute project tooling, choose dependencies, and create diffs. If your controls only check the final code, you are missing the path that produced it.

The checklist

Use this before letting AI agents work on production repositories.

Control	What to check	Why it matters
1. Scope repo trust	Agents should not inherit permanent trust for any future repo config change	A trusted folder can become unsafe after a malicious or compromised commit
2. Review agent instructions	`AGENTS.md`, `CLAUDE.md`, `.cursor/rules`, `.github/copilot-instructions.md`, and similar files	These files can steer the agent's behavior without looking like code
3. Lock down tool access	Shell, package manager, cloud CLI, database, browser, MCP, and file write permissions	Tool permissions define the blast radius
4. Require approval for sensitive actions	Installs, migrations, secrets access, deletes, deploys, external network calls	The agent should not silently perform irreversible or high-risk work
5. Gate dependency changes	New packages, version bumps, transitive dependency changes, install scripts	Agents can choose vulnerable or unnecessary packages that tests do not flag
6. Scan for removed controls	Deleted auth, tenant filters, validation, rate limits, CSRF checks, audit logs	AI refactors often look clean while removing security boundaries
7. Scan generated code locally	Run SAST, secret detection, dead code, and quality checks before opening the PR	Do not make human reviewers catch machine-checkable problems
8. Keep PRs small	Split large agent diffs and reject broad "cleanup" PRs	Big AI PRs hide removed controls and unreviewed behavior changes
9. Require negative tests	Unauthenticated, unauthorized, cross-tenant, invalid input, rate-limit, and failure tests	Happy-path tests prove the feature works, not that abuse fails
10. Treat generated config as risky	CI workflows, package scripts, Dockerfiles, MCP config, hooks, env examples	Configuration can execute code or widen access
11. Log agent-authored changes	Actor, prompt/source, tool calls when available, PR link, scan result	Auditability matters when the change was produced by a tool chain
12. Block before merge	CI gate for high-confidence security regressions	Advisory comments do not scale when agent output increases PR volume

The rest of this guide explains each control in detail.

1. Scope repo trust to content, not just paths

Most developers understand that cloning a random repository can be risky. The new agent-specific problem is that trust can persist after the repo changes.

If a developer approved a repository last month, then pulls a new commit today, the agent may still treat that working directory as trusted. But the files that drive agent behavior may have changed:

agent instruction files
MCP server definitions
hook scripts
package manager scripts
workspace settings
tool allowlists
CI workflow templates

The safe policy is simple:

If executable or agent-controlling config changes, require review again.

This does not have to mean a giant security ceremony. It can be a lightweight rule:

alert when agent config changes
require human approval before starting tools from changed config
block PRs that modify agent config without owner review
scan agent config like you scan code

Treat developer machines like runners with credentials, not harmless chat terminals.

2. Review the instructions that steer the agent

Agent instruction files are part of your security posture.

Examples include:

AGENTS.md
CLAUDE.md
GEMINI.md
.cursor/rules
.github/copilot-instructions.md
local agent config under .codex, .claude, .gemini, or similar directories

These files may not execute code directly, but they influence what the agent reads, ignores, changes, and prioritizes.

A hostile or careless instruction file can tell the agent to:

ignore security warnings
skip tests
prefer broad permissions for speed
use a specific package without justification
hide generated code in large refactors
avoid mentioning files it changed

That means instruction files need review rules.

At minimum:

do not let untrusted contributors modify agent instructions without review
mark agent instruction changes as security-sensitive in CODEOWNERS
keep instructions short and explicit
tell agents to preserve auth, validation, tenant scoping, and audit logging
tell agents to explain security-relevant changes in PR descriptions

Instruction files are not docs. They are behavior-shaping inputs.

3. Lock down tool access before the first task

The fastest way to make an agent dangerous is to give it a vague task and broad tools.

For coding agents, sensitive tools include:

shell commands
package managers
Docker
cloud CLIs
database clients
deployment commands
browser automation
MCP tools
ticketing or chat tools
secret managers

The default should be least privilege.

For a normal code-editing task, the agent usually needs:

read/write access to the repository
test command access
local scanner access
maybe package manager read/install access with approval

It usually does not need:

production credentials
write access to cloud resources
unrestricted network calls
deploy permission
database mutation access
permission to push directly to protected branches

Do not wait for an incident to define this boundary. Write it down before the agent starts doing real work.

4. Require approval for sensitive actions

Some actions should never be automatic in an agent workflow.

Require human approval for:

installing or upgrading dependencies
changing lockfiles
editing CI workflows
editing deployment config
changing auth, billing, export, or admin routes
running migrations
deleting files or data
calling external APIs
reading secrets
making network requests outside expected package registries
pushing commits or opening PRs from a privileged bot account

This is not about slowing the agent down. It is about making the trust boundary visible.

If the agent wants to do something sensitive, it should produce a short explanation:

What action it wants to take.
Why it is needed.
What files or systems it affects.
How to roll it back.

That explanation becomes review material.

5. Gate dependency changes harder than code changes

AI agents are comfortable adding dependencies.

That is a problem because dependency changes can pass tests while making the system less secure.

The DepDec-Bench authors report that in their preliminary study of 117,062 dependency changes, AI agents selected PR-time known-vulnerable versions in 2.46% of studied agent-authored dependency changes and had a net-negative security impact overall.

The practical lesson:

Do not let agents add packages casually.

Every agent-authored dependency change should answer:

Why is a new dependency needed?
Is there an existing dependency that already solves this?
Is the package actively maintained?
Is the package name real, or could it be hallucinated?
Is the version known-vulnerable?
Does the package run install scripts?
What transitive dependencies does it add?
Can this be done with the standard library or existing code instead?

For Python teams, this matters twice:

PyPI has a flat namespace, which makes hallucinated names easy to register.
AI workflows commonly involve fast-moving packages in LLM, data, and agent ecosystems.

If an agent adds a dependency, make that a security event, not a routine line item.

6. Scan for removed controls, not just new bugs

This is the big one for AI-assisted refactors.

Traditional review asks:

What did this PR add?

AI PR review also has to ask:

What did this PR remove?

Look for removed or weakened:

auth decorators
permission checks
tenant or organization filters
request validation
CSRF checks
rate limits
output encoding
audit logs
billing gates
feature gates
owner/admin role checks
safe redirect validation
security headers

A refactor can make code shorter and more readable while deleting the line that made it safe.

Example:

def get_project(project_id: str, user: User):
    return (
        db.query(Project)
        .filter(Project.id == project_id)
        .filter(Project.org_id == user.org_id)
        .one()
    )

The agent changes it to:

def get_project(project_id: str):
    return db.query(Project).filter(Project.id == project_id).one()

The function still works. The test for "loads project by ID" still passes. The tenant boundary is gone.

This is exactly where diff-aware scanning helps. You need to compare before and after, not just inspect the final state.

7. Run local checks before the PR exists

Do not wait until review to discover that the agent generated:

unused functions
dead routes
risky imports
unvalidated request paths
secrets
broken type assumptions
security-sensitive config changes
hallucinated package names

Run checks locally before the branch leaves the developer machine.

For Skylos, that means:

skylos . -a

For changed-code review:

skylos . --diff origin/main

Then put the same check in CI:

skylos cicd init

The goal is not to replace human review. The goal is to keep humans from spending their attention on issues a machine can catch.

8. Keep agent PRs small by policy

AI agents are good at producing large diffs.

Large diffs are where security review goes to die.

Use a hard policy:

small PRs are allowed
medium PRs need a stronger explanation
large PRs must be split unless there is a clear migration plan

A useful default:

PR size	Policy
Under 200 changed lines	Normal review
200 to 500 changed lines	Require focused summary and risk notes
Over 500 changed lines	Split unless approved by a maintainer

This is not about line count purity. It is about preserving review quality.

If an agent changes auth, billing, exports, webhooks, dependencies, and UI copy in one PR, the PR is not ready. Split it by risk boundary.

9. Require negative tests for security-sensitive changes

AI agents tend to write happy-path tests.

For security-sensitive code, happy-path tests are not enough.

Require negative tests for:

unauthenticated requests
authenticated but unauthorized users
cross-tenant IDs
invalid input shapes
oversized request bodies
expired or revoked tokens
malformed signatures
duplicate webhook deliveries
rate-limit exhaustion
billing or credit edge cases
export injection payloads

If the PR touches a route that requires admin, there should be a test proving viewer cannot use it.

If the PR touches tenant-scoped data, there should be a test proving another tenant cannot read it.

If the PR touches a CSV export, there should be a test with a formula-leading cell.

Agents can write these tests. They just need to be required.

10. Treat generated config like generated code

Coding agents do not only change application code.

They change:

package.json
pyproject.toml
requirements.txt
lockfiles
Dockerfiles
GitHub Actions workflows
Vercel or deployment config
MCP config
test runner config
linter config
environment examples

That can change what runs, where it runs, and with which credentials.

Review generated config for:

new scripts that execute shell commands
new install hooks
widened workflow permissions
pull_request_target misuse
id-token: write in untrusted contexts
secrets exposed to PRs
caches keyed too broadly
deployment steps on untrusted branches
package manager overrides
disabled security checks

Config is code when it changes execution.

11. Log who or what authored the change

As agent workflows mature, teams need provenance.

At minimum, track:

whether the change was human-authored, AI-assisted, or agent-authored
which tool produced it
which human approved the tool run
which files changed
whether dependencies changed
which scanner results were attached
which human approved the final PR

This does not need to be perfect. It needs to be useful during review and incident response.

When a production issue is traced to a PR, you should be able to answer:

Did an agent write this?
Did the agent run tools?
Did it add dependencies?
Did it touch auth, billing, secrets, exports, or CI?
Did the PR pass a deterministic scan?
Who approved the final diff?

That is the audit trail that makes agent adoption survivable.

12. Block high-confidence regressions before merge

AI agent output increases volume. Advisory-only security comments do not scale with volume.

Put the highest-confidence checks in the merge gate:

secrets
critical/high security findings
removed auth or validation in sensitive routes
suspicious dependency additions
changed agent/tool config without owner review
unsafe CI permission changes
known-vulnerable package versions
public exports without injection protection
missing negative tests for security-sensitive routes

Keep lower-confidence findings as review notes. Block the things you already know you would never knowingly merge.

This is the difference between "we use AI" and "we can safely scale AI-assisted development."

A practical workflow for AI agent PRs

Here is a sane default workflow for a small team.

Before the agent starts

Start from a clean branch.
Give the agent a narrow task.
Keep production credentials out of the environment.
Require approval for installs, migrations, deletes, and network calls.
Tell the agent which files are security-sensitive.

While the agent works

Review tool requests.
Reject broad refactors.
Ask for smaller commits or smaller patches.
Require explanations for dependency and config changes.
Stop the run if the agent drifts from the task.

Before the PR opens

Run tests.
Run local static analysis.
Review dependency changes.
Review removed controls.
Split the PR if it crosses too many risk boundaries.

In CI

Re-run the same scanner.
Block high-confidence security findings.
Require CODEOWNER review for auth, billing, exports, CI, dependencies, and agent config.
Attach scan results to the PR.
Record whether the PR was agent-authored or agent-assisted.

That is enough to start. You can add more governance later.

Where Skylos fits

Skylos is not an agent sandbox. It does not control whether an AI coding agent can run a shell command or read a secret.

Skylos fits after the agent writes code and before the code is trusted.

Use it to catch:

security issues in generated code
dead code left behind by agent refactors
hallucinated or risky imports
removed controls in changed code
AI-assisted regressions around auth, validation, and similar guardrails
CI-blocking findings before a human reviewer spends time on the PR

The local-first path is:

pip install skylos
skylos . -a

Then add CI:

skylos cicd init

If your team already uses Semgrep, CodeQL, Snyk, SonarQube, or GitHub Advanced Security, keep them. Skylos is the sidecar for AI-heavy repos where removed controls, dead code, and changed-code verification matter.

The bottom line

AI coding agents are not bad. Ungated AI coding agents are bad.

The right posture is not "ban agents" or "trust agents." It is:

Limit what they can do.
Review the configuration that controls them.
Treat dependency changes as security-sensitive.
Scan the diff before review.
Block high-confidence regressions before merge.

That is how teams get the speed benefit without turning every production repository into an experiment.

Frequently asked questions

What is AI coding agent security?

AI coding agent security is the practice of controlling and verifying tool-using coding assistants that can read repos, edit files, run commands, install dependencies, and open pull requests. It covers local workstation safety, repository trust, dependency decisions, and PR security gates.

How is this different from AI-generated code security?

AI-generated code security focuses on whether generated snippets contain vulnerabilities. AI coding agent security adds the tool layer: the agent may run commands, load repo instructions, change config, add packages, or interact with external systems.

Should AI coding agents be allowed to open pull requests?

Yes, but not directly to protected branches and not without guardrails. Treat agent PRs like untrusted contributor changes: scan them, require tests, review dependency changes, and block high-confidence security regressions before merge.

Where does Skylos fit in an AI coding agent workflow?

Skylos fits after the agent makes a change and before the change is trusted. Run it locally or in CI to catch security issues, dead code, hallucinated imports, removed controls, and risky AI-assisted regressions before the PR merges.

Review AI-generated code

Catch AI-introduced regressions before they land

Run Skylos locally or in CI to surface hallucinated imports, removed auth checks, secrets, and risky defaults in AI-generated diffs.

Run Skylos on your repo See the AI PR workflow

Continue exploring

AI Coding Agent Security Checklist: 12 Controls Before Agents Open PRs

Catch AI-introduced regressions before they land

AI Coding Agent Security Checklist

Why this matters now

The checklist

1. Scope repo trust to content, not just paths

2. Review the instructions that steer the agent

3. Lock down tool access before the first task

4. Require approval for sensitive actions

5. Gate dependency changes harder than code changes

6. Scan for removed controls, not just new bugs

7. Run local checks before the PR exists

8. Keep agent PRs small by policy

9. Require negative tests for security-sensitive changes

10. Treat generated config like generated code

11. Log who or what authored the change

12. Block high-confidence regressions before merge

A practical workflow for AI agent PRs

Before the agent starts

While the agent works

Before the PR opens

In CI

Where Skylos fits

The bottom line

Frequently asked questions

Catch AI-introduced regressions before they land

Related reading

AI Code Review for Security: A PR Checklist for Auth, Tenant Isolation, Validation, and Secrets

Why AI-Generated Python Code Is Insecure in 2026 (And What Static Analysis Actually Catches)

Best AI Code Security Tools in 2026 Compared