RAG Relevance Evals: Choosing the Right Models for Testing

·Oct 31, 2023 09:54 PM

RAG Relevance Evals which models to use: In the case of task specific Evals, we are finding in some cases only GPT-4 should be considered usable. The Eval Judge model should be your largest model. Results from Claude V2 testing with RAG Relevance Evals, it struggles quite a lot. https://docs.arize.com/phoenix/llm-evals/running-pre-tested-evals/retrieval-rag-relevance