Hi I attended a talk on arize phoenix yesterday which looked very promising. I had a question on the testcases that I can evaluate with phoenix. Do I have to create the test cases myself or are they dynamically generated with LLM using Phoenix?
Hey Omar Q., both are possible. We are agnostic with respect to the origin of the data in datasets and provide helpers such as llm_generate to help you create synthetic data.
We do have a collection of datasets that we use to benchmark the effectiveness of our default LLM evaluators. We typically find that users want to tailor datasets to their particular use case/ application. John G. may have more to add on benchmarks as well.
Omar Q. - Xander covered most of the details here!
Just to understand your ideal situation - you're hoping we'd provide test cases for common evaluations that you could use off-the-shelf? Or give you the ability to generate test cases within the product for your specific case?
We're doing some dataset creation work now, but it leans more towards the first option
Both John G., testsets for the common evals and the ability to generate test cases dynamically based on my use case, so the ability to generate test cases for my agent, testing the tooling, response, etc.