Hi, in this tutorial: https://docs.arize.com/phoenix/evaluation/how-to-evals/running-pre-tested-evals/q-and-a-on-retrieved-data How can I send the results back to phoenix?
To send evaluation results back to Phoenix, you can use the log_evaluations method of the px.Client class. This method allows you to log evaluation results, such as those from a Q&A evaluation, back to Phoenix for visualization. Here's a brief overview of the process: 1. Prepare Evaluation Data: Ensure your evaluation results are in a DataFrame format, with necessary columns like context.span_id to associate evaluations with specific spans.2. Log Evaluations: Use the log_evaluations method to send the evaluation data to Phoenix. You need to create SpanEvaluations or DocumentEvaluations objects with your evaluation DataFrame.Example code snippet: LBKSfrom phoenix.trace import SpanEvaluations import phoenix as pxLBKS# Assuming `qa_eval_df` is your DataFrame with evaluation results px.Client().log_evaluations( SpanEvaluations(eval_name="Q&A Correctness", dataframe=qa_eval_df) )LBKSWould you like more details on any specific part of this process?Sources:
- https://github.com/Arize-ai/phoenix/issues/2309
- https://github.com/arize-ai/phoenix/blob/main/tutorials/mistral/evaluate_rag--mistral.ipynb
馃挕聽Hint: Mention RunLLM in the thread for followups.
1. Prepare Evaluation Data: Ensure your evaluation results are in a DataFrame format, with necessary columns like context.span_id to associate evaluations with specific spans.
Where does the context.span_id come from? The dataset is:
N_EVAL_SAMPLE_SIZE = 3
df = download_benchmark_dataset(
task="qa-classification", dataset_name="qa_generated_dataset"
)
df_sample = (
df.sample(n=N_EVAL_SAMPLE_SIZE)
.reset_index(drop=True)
.rename(
columns={
"question": "input",
"context": "reference",
"sampled_answer": "output",
}
)
)Found I asked a similar question here: https://arize-ai.slack.com/archives/C04R3GXC8HK/p1738917892261619?thread_ts=1737099011.156239&cid=C04R3GXC8HK This one can be ignored.
