EvalGen Overview: Insights from Berkeley's Recent Research
If you missed last week’s discussion on EvalGen, we have the blog and podcast versions available here. ✔️ Aparna D. and SallyAnn D. provide a nice overview of this recent paper from researchers at Berkeley (“Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences”). They also review the workflow, discuss whether more assertions in evals are always a good thing, and draw out some implications from the user study outlined in the research. Don’t miss the takeaways for app builders at the very end! https://arize.com/blog/breaking-down-evalgen-who-validates-the-validators/
