Treat retrieved text as data, not instructions
External documents, web pages, tickets, and emails should not be allowed to change system rules. The model should summarize or cite them, not obey hidden instructions embedded inside them.
Use prompts and tool policies that explicitly separate trusted system instructions from untrusted retrieved passages.
Preserve access control
The retriever must enforce user permissions before content reaches the model. Filtering after generation is too late because sensitive text may already influence the answer.
Test with users who should see different subsets of the same knowledge base. Permission-aware retrieval is a core security requirement, not a nice-to-have.
Detect unsupported answers
Require citations for material claims and reject answers that cannot cite retrieved evidence. Add a groundedness check when answers affect money, policy, health, legal, or customer commitments.
Monitor failed citations and user corrections. They reveal data gaps faster than aggregate satisfaction scores.
Practical checklist
- 1Mark retrieved passages as untrusted.
- 2Enforce permissions before retrieval.
- 3Require citations for material claims.
- 4Test malicious documents.
- 5Monitor unsupported answers.
Related comparisons