Hi friends, is there a way to filter failed examples from a specific experiment, rerun them and add results to the same experiment? I have a large dataset with 10k+ examples and LLM calls failed sporadically, it would be great to just iteratively rerun the failed examples until there's none left with error is None .
It seems likely that Phoenix supports filtering failed experiment runs (e.g., where error is not None) and viewing them, but there is no explicit mention in the data sources of a built-in method to rerun only failed examples and append results to the same experiment. Filtering by error status is discussed as a desired feature and is possible via UI or code, but rerunning and updating the same experiment is not directly documented鈥攔eruns typically create new experiments or runs, not update existing ones (github_issue #5681, github_issue #5015, experiment_runs_filters.ipynb).You may need to filter failed runs, extract their examples, and rerun them as a new experiment or batch, rather than updating the original experiment. Would you like more detail on how to programmatically filter and rerun failed examples?Sources: - github_issue #5681 - experiment_runs_filters.ipynb - github_issue #5015
馃挕聽Hint: Mention RunLLM in the thread for followups.
Yes, please show me how to programmatically filter and rerun failed examples. Understood that this would create a different experiment. But is there a workaround for me to "stitch" together two experiment results depends on the error/success status of the individual examples?
hi Jo P. currently we don't have a great way to stitch together experiments, though I understand you desire this functionality. Could you give me a little bit more information about how the failures are happening? Experiments does have some built-in retrying machinery that should alleviate most sporadic issues
About the the built-in retry mechanism, how many times does it retry?
ah, that's super reasonable. by default it's set to requeue up to 10 times IIRC, but that's becomes ocassionally requests will time out of our maximum time waiting for a response
Got it. Should I put up a feature request for this?
Any other workarounds would you recommend in the meantime?
yes, please file an issue and we'll try our best to prioritize it
moving experiments over to the thin phoenix client is top of mind for us so hopefully we can get to this soon
By moving experiments over to the thin phoenix client , do you mean exposing more methods / attributes to the experiment returned by:
experiment = px.Client().get_experiment(experiment_id=experiment_id)If so, that'll be great! I naturally looked there but found nothing substantial or useful.
ah sorry for the confusion here! When you install phoenix you also install the arize-phoenix-client package, importable under phoenix.client, this is a newer version of our client with a minimal dependency footprint that we're hoping to bring to parity with all current phoenix features
the idea is we'll have a new interface for experiments under the phoenix.client package that should be (mostly) backwards compatible that's lighter weight dependency-wise while also offering some much needed ergonomics updates
