SkillRank
Back to guides
Security9 minUpdated 2026-06-04

RAG Security and Prompt Injection Defense

RAG systems read untrusted text and then ask a model to act on it. That creates a specific class of risk: retrieved content can contain instructions, bad facts, or sensitive data the user should not see.

Treat retrieved text as data, not instructions

External documents, web pages, tickets, and emails should not be allowed to change system rules. The model should summarize or cite them, not obey hidden instructions embedded inside them.

Use prompts and tool policies that explicitly separate trusted system instructions from untrusted retrieved passages.

Preserve access control

The retriever must enforce user permissions before content reaches the model. Filtering after generation is too late because sensitive text may already influence the answer.

Test with users who should see different subsets of the same knowledge base. Permission-aware retrieval is a core security requirement, not a nice-to-have.

Detect unsupported answers

Require citations for material claims and reject answers that cannot cite retrieved evidence. Add a groundedness check when answers affect money, policy, health, legal, or customer commitments.

Monitor failed citations and user corrections. They reveal data gaps faster than aggregate satisfaction scores.

Practical checklist

  1. 1Mark retrieved passages as untrusted.
  2. 2Enforce permissions before retrieval.
  3. 3Require citations for material claims.
  4. 4Test malicious documents.
  5. 5Monitor unsupported answers.

Related comparisons