RAG Security and Prompt Injection Defense

Treat retrieved text as data, not instructions

External documents, web pages, tickets, and emails should not be allowed to change system rules. The model should summarize or cite them, not obey hidden instructions embedded inside them.

Use prompts and tool policies that explicitly separate trusted system instructions from untrusted retrieved passages.

Preserve access control

The retriever must enforce user permissions before content reaches the model. Filtering after generation is too late because sensitive text may already influence the answer.

Test with users who should see different subsets of the same knowledge base. Permission-aware retrieval is a core security requirement, not a nice-to-have.

Detect unsupported answers

Require citations for material claims and reject answers that cannot cite retrieved evidence. Add a groundedness check when answers affect money, policy, health, legal, or customer commitments.

Monitor failed citations and user corrections. They reveal data gaps faster than aggregate satisfaction scores.

Practical checklist

1Mark retrieved passages as untrusted.
2Enforce permissions before retrieval.
3Require citations for material claims.
4Test malicious documents.
5Monitor unsupported answers.

Related comparisons

GPT-5.5 vs Claude Opus for Professional Work Claude Code vs Cursor OpenAI vs Gemini for Product Teams