Logging Custom Evaluations to Phoenix for Performance Tracking

·May 07, 2025 08:50 AM

Hi, I'm currently running some custom evaluations using llm_classify from phoenix.evals as part of an LLM-as-a-judge workflow. Our goal is to evaluate model performance on a local DataFrame and visualize the results in Phoenix. How can we log these custom evaluations to Phoenix so that they appear in the dashboard for performance tracking? Thanks in advance!

9 comments

· Sorted by Oldest

RunLLM
APP
·
To log your custom `llm_classify` evaluations to Phoenix and visualize them in the dashboard, use the `px.Client().log_evaluations()` method with a `SpanEvaluations` object containing your results DataFrame. Ensure your DataFrame includes a `span_id` column to associate each evaluation with the correct span. For example: ``` from phoenix.trace import SpanEvaluations px.Client().log_evaluations( SpanEvaluations(eval_name="Your Eval Name", dataframe=your_eval_results_df) ) ``` This will make your custom evals visible and sortable in the Phoenix UI, provided your DataFrame is based on spans exported from Phoenix or includes the correct identifiers. For more details and requirements, see the official <https://docs.arize.com/phoenix/tracing/how-to-tracing/feedback-and-annotations/llm-evaluations|log evaluation results guide> and <https://docs.arize.com/phoenix/evaluation/how-to-evals/bring-your-own-evaluator#%3AR6cd9uucqfkvfa%3A|bring your own evaluator documentation>. Would you like a step-by-step example or help troubleshooting a specific issue? Sources: - <https://docs.arize.com/phoenix/evaluation/how-to-evals/bring-your-own-evaluator#%3AR6cd9uucqfkvfa%3A|Bring Your Own Evaluator> - <https://docs.arize.com/phoenix/tracing/how-to-tracing/feedback-and-annotations/llm-evaluations|log evaluation results guide>
💡 Hint: Mention RunLLM in the thread for followups.
Daniel N.
·
RunLLM where can i find the span_id column
RunLLM
APP
·
You can find the `span_id` column by exporting your spans from Phoenix using the `get_spans_dataframe()` method, which returns a DataFrame with `context.span_id` as a column or index. Example: ``` import phoenix as px spans_df = px.Client().get_spans_dataframe() print(spans_df.columns) ``` The `span_id` is typically under the column `context.span_id`. Use this value to link your evaluations to the correct spans. For more, see the <https://docs.arize.com/phoenix/tracing/how-to-tracing/importing-and-exporting-traces/extract-data-from-spans#B%3A7|extract data from spans guide>. Would you like more detail on extracting or using this column? Sources: - <https://docs.arize.com/phoenix/tracing/how-to-tracing/importing-and-exporting-traces/extract-data-from-spans#B%3A7|extract data from spans guide>
💡 Hint: Mention RunLLM in the thread for followups.
Daniel N.
·
Does the datasets uploaded to phoenix have a span_id to it RunLLM
RunLLM
APP
·
Datasets uploaded directly to Phoenix (e.g., via `upload_dataset`) do not automatically include a `span_id` unless you explicitly add it as a column. Span IDs are typically present when datasets are created from traced spans, not from arbitrary uploads. See <https://github.com/Arize-ai/phoenix/blob/main/docs/datasets-and-experiments/how-to-datasets/creating-datasets.md|creating datasets from spans>. Would you like guidance on how to add span IDs to your dataset? Sources: - <https://github.com/Arize-ai/phoenix/blob/main/docs/datasets-and-experiments/how-to-datasets/creating-datasets.md|creating datasets from spans>
💡 Hint: Mention RunLLM in the thread for followups.
Daniel N.
·
Now that i have added some traced spans to a new dataset can u show me step by step how to log evaluations to phoenix from this dataset. RunLLM
RunLLM
APP
·
Certainly! Here’s a step-by-step guide to log evaluations to Phoenix from a dataset containing traced spans: 1. *Export Spans DataFrame*: Retrieve your spans with span IDs. ``` import phoenix as px spans_df = px.Client().get_spans_dataframe() ``` 2. *Prepare Evaluation DataFrame*: Create a DataFrame with at least a `span_id` column (matching your spans), and your evaluation results (e.g., `label`, `score`). ``` eval_df = spans_df[["context.span_id"]].copy() eval_df["label"] = ... # your evaluation logic here eval_df["score"] = ... # optional, your evaluation score eval_df.rename(columns={"context.span_id": "span_id"}, inplace=True) ``` 3. *Log Evaluations to Phoenix*: Use `SpanEvaluations` and `log_evaluations`. ``` from phoenix.trace import SpanEvaluations px.Client().log_evaluations( SpanEvaluations(eval_name="Your Eval Name", dataframe=eval_df) ) ``` After this, your evaluations will appear in the Phoenix UI attached to the corresponding spans. For more, see the <https://docs.arize.com/phoenix/tracing/how-to-tracing/feedback-and-annotations/llm-evaluations|official guide>. Would you like a full code example with mock evaluation logic? Sources: - <https://github.com/Arize-ai/phoenix/blob/main/docs/evaluation/how-to-evals/evaluating-phoenix-traces.md|https://github.com/Arize-ai/phoenix/blob/main/docs/evaluation/how-to-evals/evaluating-phoenix-traces.md> - <https://docs.arize.com/phoenix/tracing/how-to-tracing/feedback-and-annotations/llm-evaluations#%3AR14d9uucqfkvfa%3A|https://docs.arize.com/phoenix/tracing/how-to-tracing/feedback-and-annotations/llm-evaluations#%3AR14d9uucqfkvfa%3A>
💡 Hint: Mention RunLLM in the thread for followups.
Daniel N.
·
We restructured our workflow and ended up on this solution. Make a brand new project Trace LLM calls Fetch the LLM calls with px_client.query_spans Then use our custom eval function on the output Finally use px.Client().log_evaluations with the eval dataframe to log it into phoenix 😄 Is there any other solutions that would work better?
Daniel N.
·
Yes 🔒[private user], i can join a huddle in 5 minutes?

RunLLM
APP
·
To log your custom `llm_classify` evaluations to Phoenix and visualize them in the dashboard, use the `px.Client().log_evaluations()` method with a `SpanEvaluations` object containing your results DataFrame. Ensure your DataFrame includes a `span_id` column to associate each evaluation with the correct span. For example: ``` from phoenix.trace import SpanEvaluations px.Client().log_evaluations( SpanEvaluations(eval_name="Your Eval Name", dataframe=your_eval_results_df) ) ``` This will make your custom evals visible and sortable in the Phoenix UI, provided your DataFrame is based on spans exported from Phoenix or includes the correct identifiers. For more details and requirements, see the official <https://docs.arize.com/phoenix/tracing/how-to-tracing/feedback-and-annotations/llm-evaluations|log evaluation results guide> and <https://docs.arize.com/phoenix/evaluation/how-to-evals/bring-your-own-evaluator#%3AR6cd9uucqfkvfa%3A|bring your own evaluator documentation>. Would you like a step-by-step example or help troubleshooting a specific issue? Sources: - <https://docs.arize.com/phoenix/evaluation/how-to-evals/bring-your-own-evaluator#%3AR6cd9uucqfkvfa%3A|Bring Your Own Evaluator> - <https://docs.arize.com/phoenix/tracing/how-to-tracing/feedback-and-annotations/llm-evaluations|log evaluation results guide>
💡 Hint: Mention RunLLM in the thread for followups.
Daniel N.
·
RunLLM where can i find the span_id column
RunLLM
APP
·
You can find the `span_id` column by exporting your spans from Phoenix using the `get_spans_dataframe()` method, which returns a DataFrame with `context.span_id` as a column or index. Example: ``` import phoenix as px spans_df = px.Client().get_spans_dataframe() print(spans_df.columns) ``` The `span_id` is typically under the column `context.span_id`. Use this value to link your evaluations to the correct spans. For more, see the <https://docs.arize.com/phoenix/tracing/how-to-tracing/importing-and-exporting-traces/extract-data-from-spans#B%3A7|extract data from spans guide>. Would you like more detail on extracting or using this column? Sources: - <https://docs.arize.com/phoenix/tracing/how-to-tracing/importing-and-exporting-traces/extract-data-from-spans#B%3A7|extract data from spans guide>
💡 Hint: Mention RunLLM in the thread for followups.
Daniel N.
·
Does the datasets uploaded to phoenix have a span_id to it RunLLM
RunLLM
APP
·
Datasets uploaded directly to Phoenix (e.g., via `upload_dataset`) do not automatically include a `span_id` unless you explicitly add it as a column. Span IDs are typically present when datasets are created from traced spans, not from arbitrary uploads. See <https://github.com/Arize-ai/phoenix/blob/main/docs/datasets-and-experiments/how-to-datasets/creating-datasets.md|creating datasets from spans>. Would you like guidance on how to add span IDs to your dataset? Sources: - <https://github.com/Arize-ai/phoenix/blob/main/docs/datasets-and-experiments/how-to-datasets/creating-datasets.md|creating datasets from spans>
💡 Hint: Mention RunLLM in the thread for followups.
Daniel N.
·
Now that i have added some traced spans to a new dataset can u show me step by step how to log evaluations to phoenix from this dataset. RunLLM
RunLLM
APP
·
Certainly! Here’s a step-by-step guide to log evaluations to Phoenix from a dataset containing traced spans: 1. *Export Spans DataFrame*: Retrieve your spans with span IDs. ``` import phoenix as px spans_df = px.Client().get_spans_dataframe() ``` 2. *Prepare Evaluation DataFrame*: Create a DataFrame with at least a `span_id` column (matching your spans), and your evaluation results (e.g., `label`, `score`). ``` eval_df = spans_df[["context.span_id"]].copy() eval_df["label"] = ... # your evaluation logic here eval_df["score"] = ... # optional, your evaluation score eval_df.rename(columns={"context.span_id": "span_id"}, inplace=True) ``` 3. *Log Evaluations to Phoenix*: Use `SpanEvaluations` and `log_evaluations`. ``` from phoenix.trace import SpanEvaluations px.Client().log_evaluations( SpanEvaluations(eval_name="Your Eval Name", dataframe=eval_df) ) ``` After this, your evaluations will appear in the Phoenix UI attached to the corresponding spans. For more, see the <https://docs.arize.com/phoenix/tracing/how-to-tracing/feedback-and-annotations/llm-evaluations|official guide>. Would you like a full code example with mock evaluation logic? Sources: - <https://github.com/Arize-ai/phoenix/blob/main/docs/evaluation/how-to-evals/evaluating-phoenix-traces.md|https://github.com/Arize-ai/phoenix/blob/main/docs/evaluation/how-to-evals/evaluating-phoenix-traces.md> - <https://docs.arize.com/phoenix/tracing/how-to-tracing/feedback-and-annotations/llm-evaluations#%3AR14d9uucqfkvfa%3A|https://docs.arize.com/phoenix/tracing/how-to-tracing/feedback-and-annotations/llm-evaluations#%3AR14d9uucqfkvfa%3A>
💡 Hint: Mention RunLLM in the thread for followups.
Daniel N.
·
We restructured our workflow and ended up on this solution. Make a brand new project Trace LLM calls Fetch the LLM calls with px_client.query_spans Then use our custom eval function on the output Finally use px.Client().log_evaluations with the eval dataframe to log it into phoenix 😄 Is there any other solutions that would work better?
Daniel N.
·
Yes 🔒[private user], i can join a huddle in 5 minutes?