Troubleshooting Save/Load TraceDataset in Python: Common Errors

David K. · 2024-01-26T00:01:16.989Z

Hi - I'm having difficulty using the new Save/Load TraceDataset feature. I started with the GitHub template: dataset = TraceDataset(...) dataset_id = dataset.save() loaded_dataset = TraceDataset.load(dataset_id) But I'm having trouble properly specifying what should go inside the TraceDataset() parentheses, since I can grab all spans but I can't seem to format them properly to pass as a TraceDataset parameter: from phoenix.trace import TraceDataset allspans=px.active_session().get_spans_dataframe() dataset = TraceDataset.from_spans(allspans) ######### Generates an error What's the proper way to do this, for both loading and saving? Thanks, David

Xander S.
·
Hey David K., thanks for the note! Do you mind dropping the stacktrace you're hitting?
Xander S.
·
Also curious, do you hit the same issue if you use px.active_session().get_trace_dataset()?

David K.

Hi Xander -- Here is the traceback:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[56], line 4
      2 allspans=px.active_session().get_spans_dataframe()
      3 #allspans=px.active_session().get_trace_dataset()
----> 4 dataset = TraceDataset.from_spans(allspans)

File ~\AppData\Local\anaconda3\envs\testdir\Lib\site-packages\phoenix\trace\trace_dataset.py:139, in TraceDataset.from_spans(cls, spans)
    128 @classmethod
    129 def from_spans(cls, spans: List[Span]) -> "TraceDataset":
    130     """Creates a TraceDataset from a list of spans.
    131 
    132     Args:
   (...)
    136         TraceDataset: A TraceDataset containing the spans.
    137     """
    138     return cls(
--> 139         pd.json_normalize(
    140             (json.loads(span_to_json(span)) for span in spans),  # type: ignore
    141             max_level=1,
    142         )
    143     )

File ~\AppData\Local\anaconda3\envs\testdir\Lib\site-packages\pandas\io\json\_normalize.py:461, in _json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep, max_level)
    458     return DataFrame(_simple_json_normalize(data, sep=sep))
    460 if record_path is None:
--> 461     if any([isinstance(x, dict) for x in y.values()] for y in data):
    462         # naive normalization, this is idempotent for flat records
    463         # and potentially will inflate the data considerably for
    464         # deeply nested structures:
    465         #  {VeryLong: { b: 1,c:2}} -> {VeryLong.b:1 ,VeryLong.c:@}
    466         #
    467         # TODO: handle record value which are lists, at least error
    468         #       reasonably
    469         data = nested_to_record(data, sep=sep, max_level=max_level)
    470     return DataFrame(data)

File ~\AppData\Local\anaconda3\envs\testdir\Lib\site-packages\pandas\io\json\_normalize.py:461, in <genexpr>(.0)
    458     return DataFrame(_simple_json_normalize(data, sep=sep))
    460 if record_path is None:
--> 461     if any([isinstance(x, dict) for x in y.values()] for y in data):
    462         # naive normalization, this is idempotent for flat records
    463         # and potentially will inflate the data considerably for
    464         # deeply nested structures:
    465         #  {VeryLong: { b: 1,c:2}} -> {VeryLong.b:1 ,VeryLong.c:@}
    466         #
    467         # TODO: handle record value which are lists, at least error
    468         #       reasonably
    469         data = nested_to_record(data, sep=sep, max_level=max_level)
    470     return DataFrame(data)

AttributeError: 'str' object has no attribute 'values'

If I try px.active_session().get_trace_dataset(), I get

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[57], line 3
      1 from phoenix.trace import TraceDataset
      2 #allspans=px.active_session().get_spans_dataframe()
----> 3 allspans=px.active_session().get_trace_dataset()
      4 dataset = TraceDataset.from_spans(allspans)

AttributeError: 'ThreadSession' object has no attribute 'get_trace_dataset'

I'm not sure if I called that as you intended. Perhaps I'm missing an import? Regards, David

Xander S.
·
David K. Can you try to upgrade your version of Phoenix to the latest with pip install --upgrade arize-phoenix and run the px.active_session().get_trace_dataset() command again?
Xander S.
·
It is a new addition, I suspect you might be running a version of Phoenix that doesn't have it yet.

David K.

Hi Xander -- I had upgraded to 2.7.0 before my last email to you, and I just checked again. I still get the same issue. I started a new notebook (but maintained my active Phoenix session with 200 traces) and tried again, with the same results: !pip show arize-phoenix

Name: arize-phoenix
Version: 2.7.0
Summary: ML Observability in your notebook

import phoenix as px from phoenix.trace import TraceDataset allspans=px.active_session().get_trace_dataset() dataset = TraceDataset.from_spans(allspans)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[5], line 4
      2 from phoenix.trace import TraceDataset
----> 3 allspans=px.active_session().get_trace_dataset()
      4 dataset = TraceDataset.from_spans(allspans)

AttributeError: 'NoneType' object has no attribute 'get_trace_dataset'

Xander S.

Can you try this?

import phoenix as px

# run your app here to collect traces

dataset = px.active_session().get_trace_dataset()

David K.
·
Hi Xander -- When I tried your request, I found I couldn't generate any traces at all with 2.7.0, or for 2.6.0. However, when I downgraded back to 2.4.0, the traces returned. Each time after downgrading I exited my environment and relaunched. I think my initial problem may have been that my active session was originally generated in 2.4.0, but then I was trying to save in 2.6.0/2.7.0. However, I have yet to have success in generating any traces in 2.6.0/2.7.0 -- but traces work again immediately upon downgrading to 2.4.0, exiting the environment, re-enteriing, and running. Regards,
Xander S.
·
Hey David K., are you using LlamaIndex? Mind sharing out a code snippet?
David K.
·
Hi Xander -- Yes, I'm using LlamaIndex. I've attached a test notebook with 2 pickled files used by it. When I run with version 2.4.0, I get the proper trace feed in Phoenix. Switching to 2.6.0 or 2.7.0 and then resetting the kernel and running, I don't get any trace feed. Reverting to 2.4.0 and resetting the kernel and running again gives me a proper trace feed. Thanks and regards, David
👍1
Xander S.
·
Thanks, will take a look!
Xander S.
·
Not able to reproduce this issue on 2.7.0 David K.. Can you ensure that Phoenix is running and your LlamaIndex application is instrumented with set_global_handler?
David K.
·
Hi Xander -- Strange indeed. I use set_global_handler in my application just as in the example I sent you. I did a clean reboot and I even upgraded LlamaIndex to the latest version, and still no success. I can only get traces on 2.4, not 2.6 or 2.7. The reason I wanted to upgrade to 2.6 was to get the save/restore session functionality. Is there code I can run natively in 2.4.0 that will accomplish this? Regards, David
Xander S.
·
Hey David K., unfortunately, you'll need the more recent versions of Phoenix to persist and load trace data.
Xander S.
·
If you have a few minutes, please book a time on my calendar and we'll help you resolve the issue. https://calendly.com/xander-arize/30min

Troubleshooting Save/Load TraceDataset in Python: Common Errors

17 comments

Troubleshooting Save/Load TraceDataset in Python: Common Errors

17 comments