Interesting tool by Sahil from Gumroad: https://github.com/anti-work/shortest: create tests from NL -> create evals from NL? I had recently spoken with John and Dat from the Arize team and would love to see this in Phoenix. Why this tool? As Arize is more of a dev first tool, adding these workflows can massively help in making EDD available to PMs or SMEs. Could be a good wrapper around the testset generator by Ragas which can be fed to a DSPy pipeline!
Interesting tool Rachitt! I was following the QA space a few years back but this one is new to me. Will definitely check it out if it鈥檚 good! We鈥檙e gonna be building out more AI tools into Phoenix for sure.
Great share Rachitt S.! Will be checking out
Mikyo Aparna D.: seems the Langwatch team has DSPy optimizations, would be cool to see in Arize: https://docs.langwatch.ai/dspy-visualization/quickstart
Yeah I've seen the visualization of the optimizers. Definitely fits well into our concept of experiments. You can actually use Phoenix datasets to run an optimizer using DSPy and [Zenbase](https://github.com/zenbase-ai/core/tree/main/py) and get tracing during the optimization. Were working on the data viz for experiments and plumbing through the DSPy assertions and metrics makes a ton of sense. We're also building native evaluations (code, llm) to add as test cases to datasets so benchmarking can happen in the platform itself (with room for optimizers in a playground environment). What type of optimizers do you usually run Rachitt S.? MIPRO? Few shot?
Nice to see Zenbase here, love what Cyrus is building :) I'd love to see prompt optimization using DSPy in Arize, that can be a good product feature imo. I personally start with few shot, and my sweet spot for MiPro V2 optimizations is about 500-1000 interactions, smaller batch size makes the changes easier to debug.
