Hello Phoenix Team, I have a dataframe which has AI predictions as well as the ground truth label columns in it. I am able to upload it as dataset. Now, I calculate accuracy, precision, recall, f1 using scikit learn library. How can I log those results as Experiment in phoenix for that dataset?
You can log your own calculated global metrics (accuracy, precision, recall, F1) to Phoenix by creating a DataFrame with these metrics and a dummy span_id, then using px.Client().log_evaluations with SpanEvaluations. This will display your metrics in the Phoenix UI as part of the experiment results. Example: LBKSimport pandas as pd from phoenix.trace import SpanEvaluations import phoenix as pxLBKSmetrics_df = pd.DataFrame({ 'span_id': [1], # Dummy span_id for global metrics 'accuracy': [accuracy], 'precision': [precision], 'recall': [recall], 'f1': [f1] })LBKSclient = px.Client() client.log_evaluations(SpanEvaluations(eval_name="Global Metrics", dataframe=metrics_df))LBKSPhoenix will display these metrics in the UI if they are part of the logged evaluations. For more, see this Slack thread.Would you like more detail or a full example?Sources:
- Phoenix Slack: Logging global metrics
馃挕聽Hint: Mention RunLLM in the thread for followups.
I have logged my dataset to Phoenix...I struggle to implement the whole code because I don't understand how I can retrieve the entire dataframe inside the task function? Based on docs, it seems like task iterates over each row of the dataframe...but, while calculating metrics, I calculate it over the entire dataframe at once..
import pandas as pd
from phoenix.trace import SpanEvaluations
import phoenix as px
metrics_df = pd.DataFrame({
'span_id': [1], # Dummy span_id for global metrics
'accuracy': [accuracy],
'precision': [precision],
'recall': [recall],
'f1': [f1]
})
client = px.Client()
client.log_evaluations(SpanEvaluations(eval_name="Global Metrics", dataframe=metrics_df))RunLLM I am trying to use this code snippet; however, I struggle to use new self contained client in place of px.Client() in this code.. Can you please tell me how can I log evaluations with new self-contained client?
The new self-contained client in Phoenix (as of recent versions) does not support logging evaluations in the same way as px.Client().log_evaluations. The arize-phoenix-client package (e.g., v1.13.2) does not have a log_evaluations method or an equivalent API for logging evaluations directly鈥攖his functionality is only available in the main phoenix package, not in the standalone client. You must use the main phoenix package and px.Client().log_evaluations for this workflow. See this GitHub issue for details.If you are restricted to the self-contained client, you can only log span annotations using log_span_annotations_dataframe, not full evaluation DataFrames. For more, see this comment.Would you like more detail or an example using log_span_annotations_dataframe?Sources:
- GitHub Issue: How to log evaluations?
馃挕聽Hint: Mention RunLLM in the thread for followups.
import pandas as pd
from phoenix.trace import SpanEvaluations
import phoenix as px
metrics_df = pd.DataFrame({
'span_id': [1], # Dummy span_id for global metrics
'accuracy': [accuracy],
'precision': [precision],
'recall': [recall],
'f1': [f1]
})
client = px.Client()
client.log_evaluations(SpanEvaluations(eval_name="Global Metrics", dataframe=metrics_df))RunLLM If I use the code above to log_evaluations, where in the Phonix UI can I find them?
If you log global metrics like accuracy, precision, recall, and F1 using log_evaluations as shown, they will appear in the Phoenix UI as evaluation annotations attached to the relevant project or experiment. These metrics are displayed in the evaluation results or annotation panels, typically at the top of the experiment or trace view, depending on your workflow and UI version. For experiments, summary metrics are shown in the experiment table (see screenshot).Note: The metrics will only be visible if your DataFrame is linked to a valid project/session and the UI supports displaying global (non-row-level) metrics. For more, see this PR and docs.Would you like more detail or a UI walkthrough?Sources:
- Phoenix PR: experiment evaluation summaries in the table
馃挕聽Hint: Mention RunLLM in the thread for followups.
Hello Priyan, Thank you for the response. Can you send me some pointers or code snippets on how to log metrics to all datapoints in my dataset?
Priyan, Quick question...I am using this as an example: https://arize.com/docs/phoenix/cookbook/tracing-and-annotations/generating-synthetic-datasets-for-llm-evaluators-and-agents#upload-agent-dataset What I don't understand is that how does task_function in this experiment receive input and reference arguments inside the function while running the experiment... I don't see anything being passed in run_experiment command in that example... It gets very confusing for me to understand how I actually pass any arguments to the task and evaluator functions??
ok..got it..and how to do I access metadata columns in task and evaluator? Do I just use word metadata?
def task(input, metadata):
question = input["question"]
answer = metadata["experiment_id"]