Integrating Evaluations with Trace Data in Arize Phoenix

·May 15, 2024 07:43 PM

Hey everyone, I'm new to Arize Phoenix and need some help. How can I integrate evaluations and display them with pre-existing trace data, as mentioned in the evals_quickstart Jupyter notebook? I'm trying to use Amazon Bedrock integration, but my Phoenix UI is empty without any trace data. See the screenshot below of my Phoenix UI and the PDF with the Jupyter notebook changes for Bedrock. Thank you!

8 comments

· Sorted by Oldest

Dustin N.
·
hi Fernanda M. thank you for your interest in Phoenix! In order to get traces into Phoenix you need to first instrument bedrock using our OpenInference autoinstrumentors. You can find instructions for our bedrock auto-instrumentor here: https://github.com/Arize-ai/openinference/tree/main/python/instrumentation/openinference-instrumentation-bedrock
Dustin N.
·
After instrumenting bedrock and running a few boto calls to invoke_model, you should see some traces start to show up!
Fernanda M.
·
hey Dustin N., thanks for getting back to me. I looked into that one but it's only have traces. I also need evaluations.
Dustin N.
·
Hi Fernanda! Once you have traces you can pick up where the other notebook leaves off and run evals against them, please let us know if one of the steps isn't working for you
Arturo H.
·
There are evaluation prompts included in Arize Phoenix. I was very confused for some time, because from most examples there is always a dataframe reference, which is not in the example. That data frame has to have a span_id so that it can properly show the evaluation where it goes. This is the example which clarified this to me. https://github.com/Arize-ai/phoenix/blob/99f6a2c1b12533324132a392b0db991c35f11f94/docs/use-cases/rag-evaluation.md
qa_with_reference_df = get_qa_with_reference(px.active_session())
is the a call to a query, which gets the span id's. And it's a very particular query, you would likely make your own. I ended up finding a way to capture the span_id so I would not have to do an offline process. But it all depends on your use case. For me it's evaluations while optimizing a DSPy flow, which would be nice if they supported somehow.
Arturo H.
·
Fernanda M. ^^
Dustin N.
·
Thank you for sharing that Arturo, that's super helpful Yes! I apologize if those notebooks are unclear, as Arturo mentioned, we have a several step process in order to get evals into Phoenix:
1.
Instrument your application and collect traces
2.
export those traces as a dataframe, and structure that dataframe for the specific eval you want to run
3.
run evals on the dataframe
4.
upload those evals back to phoenix
Fernanda M.
·
Thank you so much Arturo H. and Dustin N.. I'll look into that.

Dustin N.
·
hi Fernanda M. thank you for your interest in Phoenix! In order to get traces into Phoenix you need to first instrument bedrock using our OpenInference autoinstrumentors. You can find instructions for our bedrock auto-instrumentor here: https://github.com/Arize-ai/openinference/tree/main/python/instrumentation/openinference-instrumentation-bedrock
Dustin N.
·
After instrumenting bedrock and running a few boto calls to invoke_model, you should see some traces start to show up!
Fernanda M.
·
hey Dustin N., thanks for getting back to me. I looked into that one but it's only have traces. I also need evaluations.
Dustin N.
·
Hi Fernanda! Once you have traces you can pick up where the other notebook leaves off and run evals against them, please let us know if one of the steps isn't working for you
Arturo H.
·
There are evaluation prompts included in Arize Phoenix. I was very confused for some time, because from most examples there is always a dataframe reference, which is not in the example. That data frame has to have a span_id so that it can properly show the evaluation where it goes. This is the example which clarified this to me. https://github.com/Arize-ai/phoenix/blob/99f6a2c1b12533324132a392b0db991c35f11f94/docs/use-cases/rag-evaluation.md
qa_with_reference_df = get_qa_with_reference(px.active_session())
is the a call to a query, which gets the span id's. And it's a very particular query, you would likely make your own. I ended up finding a way to capture the span_id so I would not have to do an offline process. But it all depends on your use case. For me it's evaluations while optimizing a DSPy flow, which would be nice if they supported somehow.
Arturo H.
·
Fernanda M. ^^
Dustin N.
·
Thank you for sharing that Arturo, that's super helpful Yes! I apologize if those notebooks are unclear, as Arturo mentioned, we have a several step process in order to get evals into Phoenix:
1.
Instrument your application and collect traces
2.
export those traces as a dataframe, and structure that dataframe for the specific eval you want to run
3.
run evals on the dataframe
4.
upload those evals back to phoenix
Fernanda M.
·
Thank you so much Arturo H. and Dustin N.. I'll look into that.