Hi I attended a talk on arize phoenix yesterday which looked very promising. I had a question on the testcases that I can evaluate with phoenix. Do I have to create the test cases myself or are they dynamically generated with LLM using Phoenix?
We do have a collection of datasets that we use to benchmark the effectiveness of our default LLM evaluators. We typically find that users want to tailor datasets to their particular use case/ application. John G. may have more to add on benchmarks as well.
Cool thanks, super helpful
Definitely let us know what kind of content or talks you want to see! Might be worth checking out our blog and YouTube channel as well.
Omar Q. - Xander covered most of the details here! Just to understand your ideal situation - you're hoping we'd provide test cases for common evaluations that you could use off-the-shelf? Or give you the ability to generate test cases within the product for your specific case? We're doing some dataset creation work now, but it leans more towards the first option
