what’s the best way to save a dataset and load it back up from disk? I saved it like so:
dataset_path = "dataset.json"
dataset = client.get_dataset(name=dataset_name)
df = dataset.as_dataframe()
df.to_json(dataset_path, index=False)but I can’t load it back up from disk:
-> dataset = Dataset.from_dict(dataset)
(Pdb) dataset = Dataset.from_dict(dataset)
*** KeyError: 'dataset_id'
(Pdb) dataset.keys()
dict_keys(['input', 'output', 'metadata'])I get KeyError: 'dataset_id'
The best way to save and reload a Phoenix dataset is to use the built-in Phoenix dataset persistence methods, not just saving the DataFrame to JSON. Phoenix datasets include metadata (like id, version_id, etc.) that are not present in a plain DataFrame export, so using df.to_json() and then trying to load with Dataset.from_dict() will fail due to missing required fields such as dataset_id (Phoenix Client Dataset class).Instead, use the Phoenix client’s dataset save/load methods or re-upload the DataFrame using client.upload_dataset if you want to persist and reload datasets. If you want to persist traces, use trace_dataset.save(directory=...) and TraceDataset.load(id, directory=...) (Phoenix Tracing Docs, Phoenix Docs: Save All Traces). There is no supported method to reconstruct a full Phoenix Dataset object from a plain DataFrame or JSON file alone.Would you like a code example for saving and loading datasets or traces using the Phoenix client?Sources:
- Phoenix Client Dataset class
- Phoenix Tracing Docs
- Phoenix Docs: Save All Traces
💡 Hint: Mention RunLLM in the thread for followups.
the client (/phoenix/session/client.py ) does not have a save method. client.upload_dataset does not persist the dataset locally. it uploads it to the phoenix server. My question is about persisting the dataset locally and loading it back up locally. the Dataset object (phoenix/experiments/types.py) does not have a .save method either RunLLM
You are correct: the Phoenix Python client (/phoenix/session/client.py) and the Dataset object do not provide a .save method for local persistence—only for uploading to the server. For local persistence, Phoenix supports saving and loading trace datasets (not general datasets) using the TraceDataset.save(directory=...) and TraceDataset.load(id, directory=...) methods, but there is no built-in method to persist a general Dataset object locally and reload it as a Phoenix Dataset (source, source).If you need to persist a general dataset locally, you must serialize the DataFrame and schema yourself (e.g., using df.to_parquet() and saving schema as JSON), then reconstruct the Dataset object manually by loading both and calling Dataset(df, schema, name). There is no official Phoenix API for local dataset persistence and rehydration beyond this pattern.Would you like a code example for this manual approach?Sources:
- https://github.com/Arize-ai/phoenix/pull/2082
- https://github.com/Arize-ai/phoenix/issues/2196
💡 Hint: Mention RunLLM in the thread for followups.
You are correct—there is currently no public or official Phoenix API for saving and loading general Dataset objects locally; only trace datasets have this feature (source). For general datasets, the recommended manual approach is: 1. Save the DataFrame (e.g., as Parquet or CSV) and the schema (as JSON).
2. When reloading, reconstruct the DataFrame and schema, then create the Dataset object.Example: LBKSimport pandas as pd import phoenix as px import jsonLBKS# Save df.to_parquet("dataset.parquet") with open("schema.json", "w") as f: json.dump(schema.__dict__, f)LBKS# Load df = pd.read_parquet("dataset.parquet") with open("schema.json") as f: schema_dict = json.load(f) schema = px.Schema(**schema_dict) dataset = px.Dataset(df, schema, name="my_dataset")LBKSThis is a workaround due to the lack of a built-in API for local dataset persistence.Sources:
- https://github.com/Arize-ai/phoenix/issues/2196
💡 Hint: Mention RunLLM in the thread for followups.
I didn’t try pickling but I certainly could. Thanks for the suggestion!
Sounds good! 🙏🏻 it will be handy
hi Andrew S. this functionality is out for PR here: https://github.com/Arize-ai/phoenix/pull/8453
beautiful. I will use this for sure
