Exporting All Columns from Phoenix LLM Evaluation Session to DataFrame

·Jan 17, 2024 04:59 PM

Hi - I'm using Phoenix for LLM eval and have created a session with a lot of columns that include evaluations and answers to questions. When I try to export the entire session to a dataframe using px.active_session().get_spans_dataframe('span_kind == "RETRIEVER"') or just px.active_session().get_spans_dataframe() it exports lots of columns but not the Eval columns that I sent over to the session for display, and also not the 'output' column that contains the answers to my input queries. Is there any way to faithfully export every column that appears in my Phoenix session?

8 comments

· Sorted by Oldest

Mikyo
·
If you want all the spans as a dataframe David K. you won't want filters. Also take a look at our docs on extracting data.Might clarify some things.
Mikyo
·
https://docs.arize.com/phoenix/how-to/extract-data-from-spans
Xander S.
·
Thanks David K. for raising the issue. Just to clarify, are you noticing a missing output column only when using the 'span_kind == "RETRIEVER"' query, or also when running the px.active_session().get_spans_dataframe() without a query?
David K.
·
Hi Xander - I see the output column now in the export using "px.active_session().get_spans_dataframe()" . I had missed it before because it's labelled attributes.output.value, whereas it's only called "output" in the session itself. I don't see this column in the == "RETRIEVER" filtered output, which perhaps is expected. The Eval columns are always missing, but that looks like it'll be covered by #2085 mentioned earlier. Regards,
👍1
Xander S.
·
Thanks for the follow-up David K.. Retriever spans don't typically contain the output from LLM itself, just the retrieved documents from the vector store. The output from the LLM is usually associated with an LLM span. When you run px.active_session().get_spans_dataframe() with no query, you're seeing all of the spans, including LLM spans that include the output, but when filtering on retriever spans using 'span_kind == "RETRIEVER"', you won't see those outputs.
David K.
·
Thanks, Xander. That matches my understanding now. In general, though, I think there's a latent issue where the column names associated with the get_spans_dataframe() operation don't always (or even usually) match the ones viewed in the session itself. It would be good if there were a more direct match there, for easier linkage between the session columns and the exported dataframe columns.
Xander S.
·
Thanks for the feedback. It sounds like there's maybe two issues that are contributing to this, the first being that some column are omitted when a filter is applied, and the second being that the names displayed in the UI can differ from the dataframe column names.
David K.
·
Yes, agreed - I think you've captured the right issues here.
👍1

Mikyo
·
If you want all the spans as a dataframe David K. you won't want filters. Also take a look at our docs on extracting data.Might clarify some things.
Mikyo
·
https://docs.arize.com/phoenix/how-to/extract-data-from-spans
Xander S.
·
Thanks David K. for raising the issue. Just to clarify, are you noticing a missing output column only when using the 'span_kind == "RETRIEVER"' query, or also when running the px.active_session().get_spans_dataframe() without a query?
David K.
·
Hi Xander - I see the output column now in the export using "px.active_session().get_spans_dataframe()" . I had missed it before because it's labelled attributes.output.value, whereas it's only called "output" in the session itself. I don't see this column in the == "RETRIEVER" filtered output, which perhaps is expected. The Eval columns are always missing, but that looks like it'll be covered by #2085 mentioned earlier. Regards,
👍1
Xander S.
·
Thanks for the follow-up David K.. Retriever spans don't typically contain the output from LLM itself, just the retrieved documents from the vector store. The output from the LLM is usually associated with an LLM span. When you run px.active_session().get_spans_dataframe() with no query, you're seeing all of the spans, including LLM spans that include the output, but when filtering on retriever spans using 'span_kind == "RETRIEVER"', you won't see those outputs.
David K.
·
Thanks, Xander. That matches my understanding now. In general, though, I think there's a latent issue where the column names associated with the get_spans_dataframe() operation don't always (or even usually) match the ones viewed in the session itself. It would be good if there were a more direct match there, for easier linkage between the session columns and the exported dataframe columns.
Xander S.
·
Thanks for the feedback. It sounds like there's maybe two issues that are contributing to this, the first being that some column are omitted when a filter is applied, and the second being that the names displayed in the UI can differ from the dataframe column names.
David K.
·
Yes, agreed - I think you've captured the right issues here.
👍1