Exporting Traces and Creating Archived Reports in Arize Phoenix

John W.

·Feb 28, 2024 12:36 PM

Hola... I am using Arize Phoenix and really love it. Is there a way to export a trace or create archived reports?

14 comments

· Sorted by Oldest

Nate
·
Yep! It's super simple to export traces via the get_spans_dataframe function. You can find more info about how to do that from this docs page: https://docs.arize.com/phoenix/how-to/export-your-data I believe the Phoenix team also added examples to some of the tutorials, such as this one: https://github.com/Arize-ai/phoenix/blob/main/tutorials/tracing/openai_tracing_tutorial.ipynb
John W.
·
Awesome. I'll give it a try. Thanks
John W.
·
That worked great for me. However, I want to get the metrics of ndcg, precision, hit rate, qa correctness as well as the hallucination and evaluation data. TIA.
Nate
·
Evaluation data can also be exported via the get_evaluations function. https://github.com/Arize-ai/phoenix/blob/main/src/phoenix/session/session.py#L273 . Given that spans are not 1:1 with evals, evals are exported separately from spans, but you can merge them, post-export. As for exporting a report containing overall metrics that are generated in Phoenix (hit rate, precision), I don't believe there's support for that at the moment. I'll let the Phoenix team chime in here though later in case they have anything else to add. In the mean time, feel free to add any other context around your use case and what you're intending to do with the exported data and reports -- we'd love to hear about it!
John W.
·
Thanks again... I'm experimenting with different methods of loading markdown into a RAG using Langchain. I've written simple benchmarks for all different methods of the data ingest process. I'm comparing the RAGs' built-in similarity scores to Arize's metrics to try and predict what will be the best method.

John W.

I ran the get_evaluations and got this:

[SpanEvaluations(eval_name='Hallucination', id=UUID('a4a9fe26-1074-44ea-bf73-3ccaf46b29ff')),
 SpanEvaluations(eval_name='QA Correctness', id=UUID('3cb86402-b7b9-435c-993b-e81da02b2af5')),
 DocumentEvaluations(eval_name='Relevance', id=UUID('168ebcf4-69ba-44a5-be36-50091a68a482'))]

How would I retrieve the actual data from the specific evaluation? TIA.

John W.
·
Any thoughts?
Nate
·
To get the specific evaluations, you can call .get_dataframe() . You should be able to see the span id and the specific evaluation labels/scores/explanations where applicable. And if you want to join them to your spans, you can do so via the span_id.
Nate
·
You can see the other functions you can call, including persisting the evals to disk, here: https://github.com/Arize-ai/phoenix/blob/main/src/phoenix/trace/span_evaluations.py#L97
Nate
·
Nate
·
Let me know if that makes sense and gets you what you need 😄
John W.
·
Thanks nate.
John W.
·
Nate
Nate
·
no problem!

Nate
·
Yep! It's super simple to export traces via the get_spans_dataframe function. You can find more info about how to do that from this docs page: https://docs.arize.com/phoenix/how-to/export-your-data I believe the Phoenix team also added examples to some of the tutorials, such as this one: https://github.com/Arize-ai/phoenix/blob/main/tutorials/tracing/openai_tracing_tutorial.ipynb
John W.
·
Awesome. I'll give it a try. Thanks
John W.
·
That worked great for me. However, I want to get the metrics of ndcg, precision, hit rate, qa correctness as well as the hallucination and evaluation data. TIA.
Nate
·
Evaluation data can also be exported via the get_evaluations function. https://github.com/Arize-ai/phoenix/blob/main/src/phoenix/session/session.py#L273 . Given that spans are not 1:1 with evals, evals are exported separately from spans, but you can merge them, post-export. As for exporting a report containing overall metrics that are generated in Phoenix (hit rate, precision), I don't believe there's support for that at the moment. I'll let the Phoenix team chime in here though later in case they have anything else to add. In the mean time, feel free to add any other context around your use case and what you're intending to do with the exported data and reports -- we'd love to hear about it!
John W.
·
Thanks again... I'm experimenting with different methods of loading markdown into a RAG using Langchain. I've written simple benchmarks for all different methods of the data ingest process. I'm comparing the RAGs' built-in similarity scores to Arize's metrics to try and predict what will be the best method.

John W.

I ran the get_evaluations and got this:

[SpanEvaluations(eval_name='Hallucination', id=UUID('a4a9fe26-1074-44ea-bf73-3ccaf46b29ff')),
 SpanEvaluations(eval_name='QA Correctness', id=UUID('3cb86402-b7b9-435c-993b-e81da02b2af5')),
 DocumentEvaluations(eval_name='Relevance', id=UUID('168ebcf4-69ba-44a5-be36-50091a68a482'))]

How would I retrieve the actual data from the specific evaluation? TIA.

John W.
·
Any thoughts?
Nate
·
To get the specific evaluations, you can call .get_dataframe() . You should be able to see the span id and the specific evaluation labels/scores/explanations where applicable. And if you want to join them to your spans, you can do so via the span_id.
Nate
·
You can see the other functions you can call, including persisting the evals to disk, here: https://github.com/Arize-ai/phoenix/blob/main/src/phoenix/trace/span_evaluations.py#L97
Nate
·
Nate
·
Let me know if that makes sense and gets you what you need 😄
John W.
·
Thanks nate.
John W.
·
Nate
Nate
·
no problem!