Error Loading Dataset Examples: TypeError Due to String Indexing

·Aug 29, 2025 09:49 PM

getting an error where examples in my dataset aren't loading properly. here's what dataset.examples prints:

{'RGF0YXNldEV4YW1wbGU6Mw==': Example(
    id="RGF0YXNldEV4YW1wbGU6Mw==",
    [1m[94minput[0m={
        "doc_path": "test_data/409A Valuation Report.pdf"
    },
    [1m[94moutput[0m={
        "expected_summary": "This is a 409A valuation report document that..."
    },
    [1m[94mmetadata[0m={
        "metadata": {
            "topic": "financial"
        }
    },
), 'RGF0YXNldEV4YW1wbGU6NA==': Example(
    id="RGF0YXNldEV4YW1wbGU6NA==",
    [1m[94minput[0m={
        "doc_path": "test_data/409A Valuation Report.pdf"
    },
    [1m[94moutput[0m={
        "expected_summary": "This is a 409A valuation report document that..."
    },
    [1m[94mmetadata[0m={
        "metadata": {
            "topic": "financial"
        }
    },
)}

and here's how I initially created the dataset:


    df = pd.DataFrame(
        [
            {
                "doc_path": pdf_path,
                "expected_summary": "This is a 409A valuation report document that provides a detailed analysis and valuation of company shares for tax and compliance purposes.",
                "metadata": {"topic": "financial"},
            },
            {
                "doc_path": pdf_path,
                "expected_summary": "This is a 409A valuation report document that provides a detailed analysis and valuation of company shares for tax and compliance purposes.",
                "metadata": {"topic": "financial"},
            },
        ]
    )

    return phoenix_client.upload_dataset(
        dataframe=df,
        dataset_name=dataset_name,
        input_keys=["doc_path"],
        output_keys=["expected_summary"],
        metadata_keys=["metadata"],
    )

errors out at the line with cache_key

cache_key = (example["id"], repetition_number)
TypeError: string indices must be integers, not 'str'

because Example is just
example
'RGF0YXNldEV4YW1wbGU6Mw=='

{'RGF0YXNldEV4YW1wbGU6Mw==': Example( id="RGF0YXNldEV4YW1wbGU6Mw==", [1m[94minput[0m={ "doc_path": "test_data/409A Valuation Report.pdf" }, [1m[94moutput[0m={ "expected_summary": "This is a 409A valuation report document that..." }, [1m[94mmetadata[0m={ "metadata": { "topic": "financial" } }, ), 'RGF0YXNldEV4YW1wbGU6NA==': Example( id="RGF0YXNldEV4YW1wbGU6NA==", [1m[94minput[0m={ "doc_path": "test_data/409A Valuation Report.pdf" }, [1m[94moutput[0m={ "expected_summary": "This is a 409A valuation report document that..." }, [1m[94mmetadata[0m={ "metadata": { "topic": "financial" } }, )}

df = pd.DataFrame( [ { "doc_path": pdf_path, "expected_summary": "This is a 409A valuation report document that provides a detailed analysis and valuation of company shares for tax and compliance purposes.", "metadata": {"topic": "financial"}, }, { "doc_path": pdf_path, "expected_summary": "This is a 409A valuation report document that provides a detailed analysis and valuation of company shares for tax and compliance purposes.", "metadata": {"topic": "financial"}, }, ] ) return phoenix_client.upload_dataset( dataframe=df, dataset_name=dataset_name, input_keys=["doc_path"], output_keys=["expected_summary"], metadata_keys=["metadata"], )

from phoenix.client import Client px_client = Client() px_client.datasets.create_dataset( dataframe=df, name=dataset_name, input_keys=["doc_path"], output_keys=["expected_summary"], metadata_keys=["metadata"], )

Error Loading Dataset Examples: TypeError Due to String Indexing

3 comments

Error Loading Dataset Examples: TypeError Due to String Indexing

3 comments