Integrating Ragas with OpenTelemetry in Phoenix for Evals

Kurt T. · 2025-07-28T15:55:21.495Z

I'm using ragas for evals, OpenTelemetry (by way of LlamaIndex), and would like to set the ragas output as attributes in Phoenix. Is there a playbook for how to do this so I don't have to fumble through figuring it out? The nuance of course is that to set the attributes in Phoenix, you need to be able to identify the original span the query created, after the fact. LLM Query : creates a span Ragas evals the LLM Query Now set attributes to the original span

RunLLM
APP
·
There is no official playbook for directly setting Ragas evaluation outputs as attributes on the original Phoenix span after the fact. However, the recommended approach is to log your Ragas evaluation results as Phoenix evaluations (not as span attributes), using the `SpanEvaluations` class and linking them to the correct `span_id`—this is the supported and visible way to annotate spans with evals in Phoenix. You generate a DataFrame with your Ragas scores, ensure it has a `span_id` column matching the original spans, and then log it via `px.Client().log_evaluations(SpanEvaluations(eval_name, evals_df))` (<https://github.com/Arize-ai/phoenix/blob/main/tutorials/ragas_retrieval_evals_tutorial.ipynb|source>, <https://arize.com/docs/phoenix/cookbook/evaluation/evaluate-rag#«Rsl38ut1letrnb»|source>). If you want to set custom attributes on spans after the fact, Phoenix does not support mutating span attributes post-creation; attributes must be set at span creation time (<https://arize-ai.slack.com/archives/C04R3GXC8HK/p1716404752.254739|source>). The supported workflow is to log evaluations as annotations, not as attributes, and these will appear in the Phoenix UI alongside the relevant spans. Would you like a step-by-step example of how to structure your Ragas eval DataFrame and log it to Phoenix? Sources: - <https://github.com/Arize-ai/phoenix/blob/main/tutorials/ragas_retrieval_evals_tutorial.ipynb|Ragas Retrieval Evals Tutorial> - <https://arize-ai.slack.com/archives/C04R3GXC8HK/p1716404752.254739|Slack: Custom Evaluators and Span Annotation> - <https://arize.com/docs/phoenix/cookbook/evaluation/evaluate-rag#«Rsl38ut1letrnb»|https://arize.com/docs/phoenix/cookbook/evaluation/evaluate-rag#«Rsl38ut1letrnb»>
💡 Hint: Mention RunLLM in the thread for followups.
Mikyo
·
Hey Kurt T. running into a meeting here so my response might be slightly delayed. Just to understand when you run Ragas, you run it during the execution of a llama-index query is that correct? In general we do need an openTelemetry ID to join things back. Let me get a dev-rel person to help out in your use case. Thanks for the feedback!
Kurt T.
·
I run the query (by way of LlamaIndex) Then I run ragas to evaluate the results of the query.
Kurt T.
·
Then i want to put those ragas results back into Phoenix on the query so we can do all the review, looking at the ragas results and the span details
Kurt T.
·
Per the link above from RunLLM... maybe I'm trying to solve the wrong problem? Let me state the problem itself: I want to easily be able to look at spans, and see what ragas has to say about the query that formed the span. 🙂 The link above suggests what I'm trying to do isn't even possible.
Kurt T.
·
Let me restate: What I was trying to do was
Run a query by way of LlamaIndex, and set a unique Attribute so I can find that span later
Run Ragas against the query results.
Find the original span and annotate it with the Ragas results.
So:
1.
Is this the right solution to my desire to be able to view Ragas Results and Spans?
2.
If not, what is the right solution?
Mikyo
·
1.
Is this the right solution to my desire to be able to view Ragas Results and Spans?
The right solution right now is to use something called a SpanQuery to pull the spans that match a specific criteria. (https://arize.com/docs/phoenix/tracing/how-to-tracing/importing-and-exporting-traces/extract-data-from-spans#running-span-queries) You can then log the ragas evaluations back using the annotations API (https://arize.com/docs/phoenix/tracing/how-to-tracing/feedback-and-annotations/capture-feedback) Will get back to you when I'm out of my morning meetings. Thanks for your patience.
Kurt T.
·
Ok, I see how the SpanQuery could work.... But the problem is: What am I supposed to query for?
Kurt T.
·
(there are various sources that say "just grab the last span"... which is not a reasonable solution in anything that is multi-user)
Mikyo
·
Hey Kurt, back for a breather. It can vary depending on your use-case. You typically have to query for all the data that's needed for the eval. Just grab the last span does seem naive here.
Kurt T.
·
This seems incredibly straight forward: A query happens and it creates a span. How do I get the span_id immediately at the point the query has either started or finished?
Mikyo
·
this is llama-index created span I presume?
Kurt T.
·
LlamaIndex is driving the queries. It's creating spans via OpenTelemetry
Kurt T.
·
So I guess this is an OpenTelemetry question?
Kurt T.
·
Oh. I may have solved it. How in the Phoenix UI do I find the value for a span_id?

39 comments