Spent a lot of time thinking about LLM evals in compiling this just-published definitive guide! Check it out for a deep dive on LLM model evals versus LLM system evals, how to build and run LLM evals, how to benchmark evals, and a lot more.
Would love your thoughts!