Improving Response Accuracy with Guardrails in AI Pipelines
Agree with everything Jason said to stop responses if retrieved docs don’t actually provide an answer to the question but you can also do some of this outside of prompting. At a high level, you could look into approaches like guardrails which are typically used for safety but you can implement a similar approach in your pipeline to add a check on generation that checks for hallucination / inaccuracies / etc. A very basic approach could be to pass in the query, retrieved docs, and output to an LLM and ask it if the output makes sense and is grounded based on input docs. Similarly techniques like RAG with citations aim to ground responses but there are other options that add less latency like a overlap of keywords extracted from reference docs showing up in output generation. Tldr is Jason is 100% correct on improving prompts and the actual retrieval. I’d also look into adding some some guardrails style solutions to your pipeline.
