Loading and Modifying Datasets for Experimentation

Darshan T. · 2025-11-04T07:57:28.731Z

Can we load dataset, do modification and then use modified dataset to run experiments?

8 comments

· Sorted by Oldest

RunLLM
APP
·
·
Yes, you can load a dataset, modify it (such as editing, adding, or deleting examples), and then use the modified dataset to run experiments in Phoenix. The recommended workflow is to retrieve the dataset (e.g., as a DataFrame), make your changes, and then either update the dataset in Phoenix or create a new version before running experiments with it. However, direct in-place editing via the Python SDK is limited; for full edit/delete support, you may need to use the UI or GraphQL API for now, though enhancements are planned for the SDK <https://github.com/Arize-ai/phoenix/issues/9057|[source>, <https://arize.com/docs/phoenix/datasets-and-experiments/concepts-datasets|source>, <https://arize.com/docs/phoenix/cookbook/evaluation#create-an-experiment|source>]. After modifying your dataset, you can run experiments by passing the updated dataset object to the `run_experiment` function, which will execute your task and evaluators on the new data <https://arize.com/docs/phoenix/cookbook/evaluation#create-an-experiment|[source>, <https://arize.com/docs/phoenix/datasets-and-experiments/how-to-experiments/run-experiments|source>]. Would you like a step-by-step code example for this workflow? Sources: - <https://github.com/Arize-ai/phoenix/issues/9057|https://github.com/Arize-ai/phoenix/issues/9057> - <https://arize.com/docs/phoenix/datasets-and-experiments/concepts-datasets|https://arize.com/docs/phoenix/datasets-and-experiments/concepts-datasets> - <https://arize.com/docs/phoenix/cookbook/evaluation#create-an-experiment|https://arize.com/docs/phoenix/cookbook/evaluation#create-an-experiment>
💡 Hint: Mention RunLLM in the thread for followups.
Darshan T.
·
·
No when I say modification, it is on input query of dataset
Priyan
External
·
·
Darshan T. You can edit dataset elements in the UI. Click on any dataset example and then click "Edit Example"
Priyan
External
·
·
Or, you can get the dataset with get_dataset, convert to df using to_dataframe , and then make any edits you like.
Dustin N.
·
·
hi Darshan T. In general, Phoenix experiments use dataset examples as an input to an arbitrary "task" callable. Any modifications you want can be encapsulated in that task, so for instance if you want to transform some data in all of your dataset examples (such as pulling out a specific key in the input based on the metadata), that logic and live in the task itself
Darshan T.
·
·
It is like variable in a query which i needs to update based on requirement. Yes, I'm using it that way only in task. But for eval, I'll need to update on each eval function also...so i thought, if i can better update dataset it will be smooth!
Dustin N.
·
·
in this case it seems like you might want to create a new dataset or include the transformation in the task
Darshan T.
·
·
Yes. Not specifically in task like anywhere it should be fine. Again it is not like creating and uploading new dataset, It's just for experiment part only. I tried like this but it is not working out

Loading and Modifying Datasets for Experimentation | Arize AI Community

RunLLM
APP
·
·
Yes, you can load a dataset, modify it (such as editing, adding, or deleting examples), and then use the modified dataset to run experiments in Phoenix. The recommended workflow is to retrieve the dataset (e.g., as a DataFrame), make your changes, and then either update the dataset in Phoenix or create a new version before running experiments with it. However, direct in-place editing via the Python SDK is limited; for full edit/delete support, you may need to use the UI or GraphQL API for now, though enhancements are planned for the SDK <https://github.com/Arize-ai/phoenix/issues/9057|[source>, <https://arize.com/docs/phoenix/datasets-and-experiments/concepts-datasets|source>, <https://arize.com/docs/phoenix/cookbook/evaluation#create-an-experiment|source>]. After modifying your dataset, you can run experiments by passing the updated dataset object to the `run_experiment` function, which will execute your task and evaluators on the new data <https://arize.com/docs/phoenix/cookbook/evaluation#create-an-experiment|[source>, <https://arize.com/docs/phoenix/datasets-and-experiments/how-to-experiments/run-experiments|source>]. Would you like a step-by-step code example for this workflow? Sources: - <https://github.com/Arize-ai/phoenix/issues/9057|https://github.com/Arize-ai/phoenix/issues/9057> - <https://arize.com/docs/phoenix/datasets-and-experiments/concepts-datasets|https://arize.com/docs/phoenix/datasets-and-experiments/concepts-datasets> - <https://arize.com/docs/phoenix/cookbook/evaluation#create-an-experiment|https://arize.com/docs/phoenix/cookbook/evaluation#create-an-experiment>
💡 Hint: Mention RunLLM in the thread for followups.
Darshan T.
·
·
No when I say modification, it is on input query of dataset
Priyan
External
·
·
Darshan T. You can edit dataset elements in the UI. Click on any dataset example and then click "Edit Example"
Priyan
External
·
·
Or, you can get the dataset with get_dataset, convert to df using to_dataframe , and then make any edits you like.
Dustin N.
·
·
hi Darshan T. In general, Phoenix experiments use dataset examples as an input to an arbitrary "task" callable. Any modifications you want can be encapsulated in that task, so for instance if you want to transform some data in all of your dataset examples (such as pulling out a specific key in the input based on the metadata), that logic and live in the task itself
Darshan T.
·
·
It is like variable in a query which i needs to update based on requirement. Yes, I'm using it that way only in task. But for eval, I'll need to update on each eval function also...so i thought, if i can better update dataset it will be smooth!
Dustin N.
·
·
in this case it seems like you might want to create a new dataset or include the transformation in the task
Darshan T.
·
·
Yes. Not specifically in task like anywhere it should be fine. Again it is not like creating and uploading new dataset, It's just for experiment part only. I tried like this but it is not working out