When running experiments, is it possible to only run a subset of the examples for faster iteration?
Yes, you can run experiments on a subset of examples using the dry_run parameter in both run_experiment() and evaluate_experiment(). Setting dry_run=True runs one sample, or dry_run=N runs N randomly selected examples, allowing for quick, deterministic testing without processing the full dataset or sending data to the Phoenix server. This is the recommended approach for faster iteration on a subset of your data.For more advanced filtering (e.g., by label or metadata), you would need to pre-filter your dataset before passing it to Phoenix, as there is no built-in experiment API for arbitrary subset selection beyond dry_run. Would you like more detail on how to implement this?Sources: - Phoenix Experiments Quickstart: Dry Run - Phoenix Experiments How-to: Dry Run - Phoenix Experiments GitHub Docs: Dry Run
💡 Hint: Mention RunLLM in the thread for followups.
RunLLM I would still like to track the experiment with the UI/server. Can I prefilter the dataset and pass it to the experiment method, or do I need to reupload it? My use-case is something like grading the examples as either "easy" or "hard", and then being able to run either of them alone or both together.
We just released a new feature as of Phoenix 12.7.0 that is purpose built for this use-case called "Dataset Splits". https://arize.com/docs/phoenix/datasets-and-experiments/how-to-experiments/splits The documentation is still a work in progress, however you can select one or more examples in the UI and then assign them to "splits" like "easy" or "hard" and then just run that sets of examples.
More ergonomic improvements are forthcoming
Anthony P. oh nice, that looks perfect! 🙂
my recommendation would be to enter some filter on the data using the search bar (on metadata perhaps) and then select-all of the results and assign them to a split
I will file an issue to support shift-select on the checkboxes, we should definitely have that
Anthony P. can splits be constructed from python?
that feature did not make it in for initial release, but it is planned
