Issue with Saving Evaluation Results to Phoenix: Missing context.span_id

·Jan 17, 2025 07:30 AM

Hi, I followed https://docs.arize.com/phoenix/evaluation/how-to-evals/bring-your-own-evaluator#logging-evaluations-to-phoenix and trying to save the evaluation results to Phoenix, but it reports: ValueError: The dataframe index must be ['context.span_id'] but was '[None]'. My question is where does the context.span_id come from? Even after I added the index manually, it still cannot save the results. The code is: https://gist.github.com/goofansu/da6f1a7e07a3099021d93b60367411fa

9 comments

· Sorted by Oldest

RunLLM
APP
·
It seems likely that the context.span_id is a crucial identifier used by Phoenix to associate evaluation results with specific spans. According to Phoenix documentation, the span_id must be present as an index or a column in the evaluation DataFrame to correctly log evaluations. This identifier is typically generated during the tracing process and should be extracted from the trace data.
If manually adding the context.span_id does not resolve the issue, it might be due to incorrect or missing span IDs. You should ensure that the span IDs in your DataFrame match those in the Phoenix trace data. You can retrieve the correct span IDs using methods like get_qa_with_reference or get_retrieved_documents from the Phoenix library, as mentioned in the Phoenix session evaluation code.
Would you like more detail on how to extract and use context.span_id correctly?
Sources: - Phoenix session evaluation code - Phoenix documentation
💡 Hint: Mention RunLLM in the thread for followups.
James S.
·
But I didn’t use data from Phoenix, I just fake data and try the evaluation.
Xander S.
·
Hey James S., we definitely recommend pulling that dataframe from Phoenix rather than trying to manually construct. Can you add some traces to Phoenix and try trace_df = px.Client(endpoint="http://127.0.0.1:6006").get_spans_dataframe()?

James S.

Ah, yes, I’m just trying:

from phoenix.otel import register
from openinference.instrumentation.litellm import LiteLLMInstrumentor

tracer_provider = register(
    endpoint="http://localhost:6006/v1/traces",
)

LiteLLMInstrumentor().instrument(tracer_provider=tracer_provider)

👍1

James S.
·
Xander S.
pulling that dataframe from Phoenix rather than trying to manually construct.
Two questions:
1.
Can I pull dataset from phoenix and convert it to dataframe? As llm_classify requires the data to be dataframe or array list.
2.
If llm_classify can run with the pulled dataset, is the context.span_id added in the result automatically?
Thanks
James S.
·
1.
Can I pull dataset from phoenix and convert it to dataframe? As llm_classify requires the data to be dataframe or array list.
There is dataset.as_dataframe. In fact, the problem is I don’t know what function shall I use to run evaluations, there are several methods:
1.
phoenix.experiments.run_experiment against dataset. Result is saved in phoenix.
2.
phoenix.experiments.eval_experiment against experiment. Result is saved in phoenix.
3.
phoenix.evals.llm_classify against dataframe or list, Result isn’t saved in phoenix.
When shall I use phoenix.evals.llm_classify ? As it doesn’t save the result to phoenix. Shall I use phoenix built-in evaluators in experiments?
Dustin N.
·
hi James S. we know it's confusing but phoenix-evals and experiments use different evaluation paradigms
Dustin N.
·
experiments already have intputs / outputs and expected values defined on each experiment run, so defining a function that takes those values and evaluates them is much more straightforward, we provide examples in our documentation and supply some by default: https://docs.arize.com/phoenix/datasets-and-experiments/how-to-experiments/using-evaluators
James S.
·
Dustin N. Thank you, I read the document and run experiment with dataset successfully.
🙏1

RunLLM
APP
·
It seems likely that the context.span_id is a crucial identifier used by Phoenix to associate evaluation results with specific spans. According to Phoenix documentation, the span_id must be present as an index or a column in the evaluation DataFrame to correctly log evaluations. This identifier is typically generated during the tracing process and should be extracted from the trace data.
If manually adding the context.span_id does not resolve the issue, it might be due to incorrect or missing span IDs. You should ensure that the span IDs in your DataFrame match those in the Phoenix trace data. You can retrieve the correct span IDs using methods like get_qa_with_reference or get_retrieved_documents from the Phoenix library, as mentioned in the Phoenix session evaluation code.
Would you like more detail on how to extract and use context.span_id correctly?
Sources: - Phoenix session evaluation code - Phoenix documentation
💡 Hint: Mention RunLLM in the thread for followups.
James S.
·
But I didn’t use data from Phoenix, I just fake data and try the evaluation.
Xander S.
·
Hey James S., we definitely recommend pulling that dataframe from Phoenix rather than trying to manually construct. Can you add some traces to Phoenix and try trace_df = px.Client(endpoint="http://127.0.0.1:6006").get_spans_dataframe()?

James S.

Ah, yes, I’m just trying:

from phoenix.otel import register
from openinference.instrumentation.litellm import LiteLLMInstrumentor

tracer_provider = register(
    endpoint="http://localhost:6006/v1/traces",
)

LiteLLMInstrumentor().instrument(tracer_provider=tracer_provider)

👍1

James S.
·
Xander S.
pulling that dataframe from Phoenix rather than trying to manually construct.
Two questions:
1.
Can I pull dataset from phoenix and convert it to dataframe? As llm_classify requires the data to be dataframe or array list.
2.
If llm_classify can run with the pulled dataset, is the context.span_id added in the result automatically?
Thanks
James S.
·
1.
Can I pull dataset from phoenix and convert it to dataframe? As llm_classify requires the data to be dataframe or array list.
There is dataset.as_dataframe. In fact, the problem is I don’t know what function shall I use to run evaluations, there are several methods:
1.
phoenix.experiments.run_experiment against dataset. Result is saved in phoenix.
2.
phoenix.experiments.eval_experiment against experiment. Result is saved in phoenix.
3.
phoenix.evals.llm_classify against dataframe or list, Result isn’t saved in phoenix.
When shall I use phoenix.evals.llm_classify ? As it doesn’t save the result to phoenix. Shall I use phoenix built-in evaluators in experiments?
Dustin N.
·
hi James S. we know it's confusing but phoenix-evals and experiments use different evaluation paradigms
Dustin N.
·
experiments already have intputs / outputs and expected values defined on each experiment run, so defining a function that takes those values and evaluates them is much more straightforward, we provide examples in our documentation and supply some by default: https://docs.arize.com/phoenix/datasets-and-experiments/how-to-experiments/using-evaluators
James S.
·
Dustin N. Thank you, I read the document and run experiment with dataset successfully.
🙏1