getting an error where examples in my dataset aren't loading properly. here's what dataset.examples prints:
{'RGF0YXNldEV4YW1wbGU6Mw==': Example(
id="RGF0YXNldEV4YW1wbGU6Mw==",
[1m[94minput[0m={
"doc_path": "test_data/409A Valuation Report.pdf"
},
[1m[94moutput[0m={
"expected_summary": "This is a 409A valuation report document that..."
},
[1m[94mmetadata[0m={
"metadata": {
"topic": "financial"
}
},
), 'RGF0YXNldEV4YW1wbGU6NA==': Example(
id="RGF0YXNldEV4YW1wbGU6NA==",
[1m[94minput[0m={
"doc_path": "test_data/409A Valuation Report.pdf"
},
[1m[94moutput[0m={
"expected_summary": "This is a 409A valuation report document that..."
},
[1m[94mmetadata[0m={
"metadata": {
"topic": "financial"
}
},
)}and here's how I initially created the dataset:
df = pd.DataFrame(
[
{
"doc_path": pdf_path,
"expected_summary": "This is a 409A valuation report document that provides a detailed analysis and valuation of company shares for tax and compliance purposes.",
"metadata": {"topic": "financial"},
},
{
"doc_path": pdf_path,
"expected_summary": "This is a 409A valuation report document that provides a detailed analysis and valuation of company shares for tax and compliance purposes.",
"metadata": {"topic": "financial"},
},
]
)
return phoenix_client.upload_dataset(
dataframe=df,
dataset_name=dataset_name,
input_keys=["doc_path"],
output_keys=["expected_summary"],
metadata_keys=["metadata"],
)errors out at the line with cache_key
cache_key = (example["id"], repetition_number)
TypeError: string indices must be integers, not 'str'
because Example is just
example
'RGF0YXNldEV4YW1wbGU6Mw=='It seems likely that your code expects dataset.examples to be a list of example dicts, but in the current Phoenix client, dataset.examples is a dictionary mapping example IDs to Example objects, not a list. This change means you should iterate over dataset.examples.values() or use indexing like dataset[0] to access examples by position, not by key. The error TypeError: string indices must be integers, not 'str' happens because you're treating a string (the key) as a dict, rather than accessing the actual example object (see this issue, migration guide).To fix this, use for example in dataset: or for example in dataset.examples.values(): to get the actual example objects, and access their fields as example.input, example.output, etc. If you want the first example, use dataset[0] instead of dataset.examples[0]. Would you like a code example for this pattern?Sources:
- https://github.com/Arize-ai/phoenix/issues/3780
- migration guide
馃挕聽Hint: Mention RunLLM in the thread for followups.
this is because examples has changed from a dictionary to a list in the new client package. To avoid the error, you can try using the new client to update/create the dataset as shown below
from phoenix.client import Client
px_client = Client()
px_client.datasets.create_dataset(
dataframe=df,
name=dataset_name,
input_keys=["doc_path"],
output_keys=["expected_summary"],
metadata_keys=["metadata"],
)