Exploring Phoenix Experiments: Code Evals and CI/CD Workflow Insights
Chris The experiments feature in Phoenix supports code Evals in addition to LLM Evals: https://docs.arize.com/phoenix/datasets-and-experiments/how-to-experiments/using-evaluators#code-evaluators Tracing: I would say tracing tends to play a deeper role in the development CI/CD workflow than traditional software. Many people save traces to datasets to create their test sets for CI/CD. Experiments: Experiments allow you to run a test periodically of your prompt or model changes on test data. I see teams setting up these tests as github action workflows, check in prompt, test runs. Evals: In your case you would have 2 code based evals and an LLM eval run on the data. A place to start might be instrument tracing, test Evals on a dataframe pull of that trace data, once you feel good about the Eval results, put each cleanly into a Task function for an experiment that works on a dataset, then collect a good test set in the dataset for action. https://docs.arize.com/phoenix/datasets-and-experiments/how-to-experiments/run-experiments
