Tutorial on running evals step by step
The standard for evaluating text has always been human labeling. But human evaluation is much more expensive to setup and maintain.
AI engineers are now relying on LLMs to evaluate the performance of their applications. We built the open source Arize Phoenix LLM Evals library for simple, fast, and accurate LLM-based evaluations.
Get a quick taste in this demo โ the second in a series โ exploring how you can evaluate your LLM app for things like hallucinations leveraging Phoenix.
https://www.youtube.com/watch?v=5yjcbQDLLnw&feature=youtu.be