AI Coding Agent Security Checklist
AI coding agents are not just autocomplete anymore.
They can read your repository, edit several files, run commands, install packages, update tests, call MCP tools, and open pull requests. Some teams already let them handle small fixes end to end.
That is useful. It is also a different security model.
When an agent only suggests a function, the main question is:
Is this generated code safe?
When an agent can take actions inside a repository, the question becomes:
What can this tool chain do if the prompt is wrong, the repo instructions are hostile, a dependency is malicious, or the generated PR removes a security control?
This guide gives you a practical AI coding agent security checklist for teams using Claude Code, Cursor, Codex, Copilot, Devin, Windsurf, Gemini CLI, OpenHands, or any internal coding agent.
If you want the short version, here it is:
Treat every AI coding agent like an untrusted contributor with a fast keyboard and tool access. Let it help, but verify every security-relevant action before merge.
Why this matters now
The risk has moved from snippets to workflows.
Research keeps pointing in the same direction:
- The SecureVibeBench preprint evaluates code agents on realistic multi-file tasks based on OSS-Fuzz vulnerability-introducing scenarios. The authors report that even the best-performing agent produced correct and secure solutions on only 23.8% of tasks.
- The DepDec-Bench preprint looks at dependency decision-making and reports that AI agents selected PR-time known-vulnerable dependency versions in 2.46% of studied agent-authored dependency changes, with a net-negative security impact compared with human-authored changes.
- Snyk's ToxicSkills research scanned 3,984 agent skills and reported critical issues in 13.4% of them and at least one security flaw in 36.82%.
- Mindgard's trust-persistence writeup argues that coding-agent trust decisions can become stale when project-controlled configuration changes after a folder has already been approved.
You do not need to agree with every methodology to see the pattern.
AI agents are now part of the development supply chain. They consume repo instructions, execute project tooling, choose dependencies, and create diffs. If your controls only check the final code, you are missing the path that produced it.
The checklist
Use this before letting AI agents work on production repositories.
| Control | What to check | Why it matters |
|---|---|---|
| 1. Scope repo trust | Agents should not inherit permanent trust for any future repo config change | A trusted folder can become unsafe after a malicious or compromised commit |
| 2. Review agent instructions | AGENTS.md, CLAUDE.md, .cursor/rules, .github/copilot-instructions.md, and similar files | These files can steer the agent's behavior without looking like code |
| 3. Lock down tool access | Shell, package manager, cloud CLI, database, browser, MCP, and file write permissions | Tool permissions define the blast radius |
| 4. Require approval for sensitive actions | Installs, migrations, secrets access, deletes, deploys, external network calls | The agent should not silently perform irreversible or high-risk work |
| 5. Gate dependency changes | New packages, version bumps, transitive dependency changes, install scripts | Agents can choose vulnerable or unnecessary packages that tests do not flag |
| 6. Scan for removed controls | Deleted auth, tenant filters, validation, rate limits, CSRF checks, audit logs | AI refactors often look clean while removing security boundaries |
| 7. Scan generated code locally | Run SAST, secret detection, dead code, and quality checks before opening the PR | Do not make human reviewers catch machine-checkable problems |
| 8. Keep PRs small | Split large agent diffs and reject broad "cleanup" PRs | Big AI PRs hide removed controls and unreviewed behavior changes |
| 9. Require negative tests | Unauthenticated, unauthorized, cross-tenant, invalid input, rate-limit, and failure tests | Happy-path tests prove the feature works, not that abuse fails |
| 10. Treat generated config as risky | CI workflows, package scripts, Dockerfiles, MCP config, hooks, env examples | Configuration can execute code or widen access |
| 11. Log agent-authored changes | Actor, prompt/source, tool calls when available, PR link, scan result | Auditability matters when the change was produced by a tool chain |
| 12. Block before merge | CI gate for high-confidence security regressions | Advisory comments do not scale when agent output increases PR volume |
The rest of this guide explains each control in detail.
1. Scope repo trust to content, not just paths
Most developers understand that cloning a random repository can be risky. The new agent-specific problem is that trust can persist after the repo changes.
If a developer approved a repository last month, then pulls a new commit today, the agent may still treat that working directory as trusted. But the files that drive agent behavior may have changed:
- agent instruction files
- MCP server definitions
- hook scripts
- package manager scripts
- workspace settings
- tool allowlists
- CI workflow templates
The safe policy is simple:
If executable or agent-controlling config changes, require review again.
This does not have to mean a giant security ceremony. It can be a lightweight rule:
- alert when agent config changes
- require human approval before starting tools from changed config
- block PRs that modify agent config without owner review
- scan agent config like you scan code
Treat developer machines like runners with credentials, not harmless chat terminals.
2. Review the instructions that steer the agent
Agent instruction files are part of your security posture.
Examples include:
AGENTS.mdCLAUDE.mdGEMINI.md.cursor/rules.github/copilot-instructions.md- local agent config under
.codex,.claude,.gemini, or similar directories
These files may not execute code directly, but they influence what the agent reads, ignores, changes, and prioritizes.
A hostile or careless instruction file can tell the agent to:
- ignore security warnings
- skip tests
- prefer broad permissions for speed
- use a specific package without justification
- hide generated code in large refactors
- avoid mentioning files it changed
That means instruction files need review rules.
At minimum:
- do not let untrusted contributors modify agent instructions without review
- mark agent instruction changes as security-sensitive in CODEOWNERS
- keep instructions short and explicit
- tell agents to preserve auth, validation, tenant scoping, and audit logging
- tell agents to explain security-relevant changes in PR descriptions
Instruction files are not docs. They are behavior-shaping inputs.
3. Lock down tool access before the first task
The fastest way to make an agent dangerous is to give it a vague task and broad tools.
For coding agents, sensitive tools include:
- shell commands
- package managers
- Docker
- cloud CLIs
- database clients
- deployment commands
- browser automation
- MCP tools
- ticketing or chat tools
- secret managers
The default should be least privilege.
For a normal code-editing task, the agent usually needs:
- read/write access to the repository
- test command access
- local scanner access
- maybe package manager read/install access with approval
It usually does not need:
- production credentials
- write access to cloud resources
- unrestricted network calls
- deploy permission
- database mutation access
- permission to push directly to protected branches
Do not wait for an incident to define this boundary. Write it down before the agent starts doing real work.
4. Require approval for sensitive actions
Some actions should never be automatic in an agent workflow.
Require human approval for:
- installing or upgrading dependencies
- changing lockfiles
- editing CI workflows
- editing deployment config
- changing auth, billing, export, or admin routes
- running migrations
- deleting files or data
- calling external APIs
- reading secrets
- making network requests outside expected package registries
- pushing commits or opening PRs from a privileged bot account
This is not about slowing the agent down. It is about making the trust boundary visible.
If the agent wants to do something sensitive, it should produce a short explanation:
- What action it wants to take.
- Why it is needed.
- What files or systems it affects.
- How to roll it back.
That explanation becomes review material.
5. Gate dependency changes harder than code changes
AI agents are comfortable adding dependencies.
That is a problem because dependency changes can pass tests while making the system less secure.
The DepDec-Bench authors report that in their preliminary study of 117,062 dependency changes, AI agents selected PR-time known-vulnerable versions in 2.46% of studied agent-authored dependency changes and had a net-negative security impact overall.
The practical lesson:
Do not let agents add packages casually.
Every agent-authored dependency change should answer:
- Why is a new dependency needed?
- Is there an existing dependency that already solves this?
- Is the package actively maintained?
- Is the package name real, or could it be hallucinated?
- Is the version known-vulnerable?
- Does the package run install scripts?
- What transitive dependencies does it add?
- Can this be done with the standard library or existing code instead?
For Python teams, this matters twice:
- PyPI has a flat namespace, which makes hallucinated names easy to register.
- AI workflows commonly involve fast-moving packages in LLM, data, and agent ecosystems.
If an agent adds a dependency, make that a security event, not a routine line item.
Related: Slopsquatting in Python: What Hallucinated Package Names Mean for Your Supply Chain
6. Scan for removed controls, not just new bugs
This is the big one for AI-assisted refactors.
Traditional review asks:
What did this PR add?
AI PR review also has to ask:
What did this PR remove?
Look for removed or weakened:
- auth decorators
- permission checks
- tenant or organization filters
- request validation
- CSRF checks
- rate limits
- output encoding
- audit logs
- billing gates
- feature gates
- owner/admin role checks
- safe redirect validation
- security headers
A refactor can make code shorter and more readable while deleting the line that made it safe.
Example:
def get_project(project_id: str, user: User):
return (
db.query(Project)
.filter(Project.id == project_id)
.filter(Project.org_id == user.org_id)
.one()
)
The agent changes it to:
def get_project(project_id: str):
return db.query(Project).filter(Project.id == project_id).one()
The function still works. The test for "loads project by ID" still passes. The tenant boundary is gone.
This is exactly where diff-aware scanning helps. You need to compare before and after, not just inspect the final state.
Related: AI Code Review for Security: A PR Checklist
7. Run local checks before the PR exists
Do not wait until review to discover that the agent generated:
- unused functions
- dead routes
- risky imports
- unvalidated request paths
- secrets
- broken type assumptions
- security-sensitive config changes
- hallucinated package names
Run checks locally before the branch leaves the developer machine.
For Skylos, that means:
skylos . -a
For changed-code review:
skylos . --diff origin/main
Then put the same check in CI:
skylos cicd init
The goal is not to replace human review. The goal is to keep humans from spending their attention on issues a machine can catch.
8. Keep agent PRs small by policy
AI agents are good at producing large diffs.
Large diffs are where security review goes to die.
Use a hard policy:
- small PRs are allowed
- medium PRs need a stronger explanation
- large PRs must be split unless there is a clear migration plan
A useful default:
| PR size | Policy |
|---|---|
| Under 200 changed lines | Normal review |
| 200 to 500 changed lines | Require focused summary and risk notes |
| Over 500 changed lines | Split unless approved by a maintainer |
This is not about line count purity. It is about preserving review quality.
If an agent changes auth, billing, exports, webhooks, dependencies, and UI copy in one PR, the PR is not ready. Split it by risk boundary.
9. Require negative tests for security-sensitive changes
AI agents tend to write happy-path tests.
For security-sensitive code, happy-path tests are not enough.
Require negative tests for:
- unauthenticated requests
- authenticated but unauthorized users
- cross-tenant IDs
- invalid input shapes
- oversized request bodies
- expired or revoked tokens
- malformed signatures
- duplicate webhook deliveries
- rate-limit exhaustion
- billing or credit edge cases
- export injection payloads
If the PR touches a route that requires admin, there should be a test proving viewer cannot use it.
If the PR touches tenant-scoped data, there should be a test proving another tenant cannot read it.
If the PR touches a CSV export, there should be a test with a formula-leading cell.
Agents can write these tests. They just need to be required.
10. Treat generated config like generated code
Coding agents do not only change application code.
They change:
package.jsonpyproject.tomlrequirements.txt- lockfiles
- Dockerfiles
- GitHub Actions workflows
- Vercel or deployment config
- MCP config
- test runner config
- linter config
- environment examples
That can change what runs, where it runs, and with which credentials.
Review generated config for:
- new scripts that execute shell commands
- new install hooks
- widened workflow permissions
pull_request_targetmisuseid-token: writein untrusted contexts- secrets exposed to PRs
- caches keyed too broadly
- deployment steps on untrusted branches
- package manager overrides
- disabled security checks
Config is code when it changes execution.
11. Log who or what authored the change
As agent workflows mature, teams need provenance.
At minimum, track:
- whether the change was human-authored, AI-assisted, or agent-authored
- which tool produced it
- which human approved the tool run
- which files changed
- whether dependencies changed
- which scanner results were attached
- which human approved the final PR
This does not need to be perfect. It needs to be useful during review and incident response.
When a production issue is traced to a PR, you should be able to answer:
- Did an agent write this?
- Did the agent run tools?
- Did it add dependencies?
- Did it touch auth, billing, secrets, exports, or CI?
- Did the PR pass a deterministic scan?
- Who approved the final diff?
That is the audit trail that makes agent adoption survivable.
12. Block high-confidence regressions before merge
AI agent output increases volume. Advisory-only security comments do not scale with volume.
Put the highest-confidence checks in the merge gate:
- secrets
- critical/high security findings
- removed auth or validation in sensitive routes
- suspicious dependency additions
- changed agent/tool config without owner review
- unsafe CI permission changes
- known-vulnerable package versions
- public exports without injection protection
- missing negative tests for security-sensitive routes
Keep lower-confidence findings as review notes. Block the things you already know you would never knowingly merge.
This is the difference between "we use AI" and "we can safely scale AI-assisted development."
A practical workflow for AI agent PRs
Here is a sane default workflow for a small team.
Before the agent starts
- Start from a clean branch.
- Give the agent a narrow task.
- Keep production credentials out of the environment.
- Require approval for installs, migrations, deletes, and network calls.
- Tell the agent which files are security-sensitive.
While the agent works
- Review tool requests.
- Reject broad refactors.
- Ask for smaller commits or smaller patches.
- Require explanations for dependency and config changes.
- Stop the run if the agent drifts from the task.
Before the PR opens
- Run tests.
- Run local static analysis.
- Review dependency changes.
- Review removed controls.
- Split the PR if it crosses too many risk boundaries.
In CI
- Re-run the same scanner.
- Block high-confidence security findings.
- Require CODEOWNER review for auth, billing, exports, CI, dependencies, and agent config.
- Attach scan results to the PR.
- Record whether the PR was agent-authored or agent-assisted.
That is enough to start. You can add more governance later.
Where Skylos fits
Skylos is not an agent sandbox. It does not control whether an AI coding agent can run a shell command or read a secret.
Skylos fits after the agent writes code and before the code is trusted.
Use it to catch:
- security issues in generated code
- dead code left behind by agent refactors
- hallucinated or risky imports
- removed controls in changed code
- AI-assisted regressions around auth, validation, and similar guardrails
- CI-blocking findings before a human reviewer spends time on the PR
The local-first path is:
pip install skylos
skylos . -a
Then add CI:
skylos cicd init
If your team already uses Semgrep, CodeQL, Snyk, SonarQube, or GitHub Advanced Security, keep them. Skylos is the sidecar for AI-heavy repos where removed controls, dead code, and changed-code verification matter.
The bottom line
AI coding agents are not bad. Ungated AI coding agents are bad.
The right posture is not "ban agents" or "trust agents." It is:
- Limit what they can do.
- Review the configuration that controls them.
- Treat dependency changes as security-sensitive.
- Scan the diff before review.
- Block high-confidence regressions before merge.
That is how teams get the speed benefit without turning every production repository into an experiment.