How to Scan LLM Applications for Prompt Injection, Data Leaks, and Missing Guardrails

If your app uses LLMs, the model call is only one part of the security story.

The real risk usually sits in the application logic around it:

  • what untrusted input reaches the prompt
  • what retrieved data enters context
  • what tools the model can influence
  • what happens to model output afterward
  • whether secrets, PII, or tenant data cross boundaries they should not cross

That is why "just prompt better" is not a security program.

You need to scan the surrounding code.


The code surfaces that matter most

SurfaceCommon failure mode
Prompt constructionUntrusted input concatenated directly into instructions
RAG / retrievalCross-tenant or over-broad context included in prompts
Tool callingModel output influences sensitive actions without enough review
Output handlingUnsanitized or over-trusted responses reach users or systems
Secrets and configAPI keys, internal URLs, or fallback credentials leak into code
Rate and cost controlsNo practical limit on expensive or abusable model paths

If your codebase exposes any of those, treat it like application security work, not just AI experimentation.


Start with discovery

Before you secure an LLM application, find every place the model actually appears.

Run:

skylos discover .

Use that output to map:

  • direct OpenAI, Anthropic, or other model SDK calls
  • prompt builder modules
  • retrieval and vector search integrations
  • tool wrappers
  • policy or moderation layers
  • output post-processing

Teams are often surprised by how many indirect LLM entry points have accumulated in helper code, admin tools, and internal services.


Then scan the AI-specific risks

Run:

skylos defend .

This is the right step when you need to inspect:

  • prompt injection exposure
  • missing input validation
  • weak output sanitization
  • retrieval or tenant-isolation issues
  • missing PII filtering
  • weak cost or rate controls

The point is not to prove the model is safe in some abstract sense. The point is to find where the code around the model makes unsafe behavior possible.


Prompt injection is usually a systems problem

Teams often talk about prompt injection like it starts and ends with a malicious sentence.

In production, it is usually a flow problem:

  1. untrusted content enters the system
  2. the application forwards it into model context
  3. the model output influences a tool, decision, or response
  4. the app trusts that result too much

If you break any one of those steps with good controls, the incident gets much smaller.

That is why scanning should focus on data flow and control boundaries, not just prompt wording.


What a secure default looks like

For most LLM applications, the initial guardrails should include:

  • explicit input validation on user and retrieved content
  • clear separation between instructions and untrusted text
  • output sanitization before rendering or executing results
  • narrow tool permissions
  • tenant-aware retrieval boundaries
  • cost and rate limits on model paths
  • logging that avoids leaking secrets or personal data

If you are using MCP-connected agents inside the product or engineering workflow, add the controls from How to Secure an MCP Server Before You Trust It With Your Code.


Make it part of the merge workflow

Once discovery and defense scans are in place, add the normal repo gate too:

skylos . -a
skylos cicd init

That gives you:

  1. general static analysis
  2. AI-specific application checks
  3. a repeatable PR gate

Without that third step, AI features tend to drift into a weaker state as new tools, prompts, and wrappers get added under delivery pressure.


What to review manually after the scan

Use the scan to narrow the review, then ask:

  • Can untrusted text influence a privileged tool call?
  • Can one tenant's context leak into another tenant's answer?
  • Can model output trigger a sensitive action without human confirmation?
  • Are retries, fallbacks, or debug paths bypassing policy checks?
  • Did a refactor remove rate limits, moderation, or PII filtering?

That last question matters more than teams think. AI features often regress through cleanup changes, not just through brand-new implementation mistakes.


Where to go next

References