How to Send Results Back to Phoenix: A Q&A Guide

James S. · 2025-02-07T08:20:58.801Z

Hi, in this tutorial: https://docs.arize.com/phoenix/evaluation/how-to-evals/running-pre-tested-evals/q-and-a-on-retrieved-data How can I send the results back to phoenix?

3 comments

· Sorted by Oldest

RunLLM
APP
·
To send evaluation results back to Phoenix, you can use the `log_evaluations` method of the `px.Client` class. This method allows you to log evaluation results, such as those from a Q&A evaluation, back to Phoenix for visualization. Here's a brief overview of the process: 1. *Prepare Evaluation Data*: Ensure your evaluation results are in a DataFrame format, with necessary columns like `context.span_id` to associate evaluations with specific spans. 2. *Log Evaluations*: Use the `log_evaluations` method to send the evaluation data to Phoenix. You need to create `SpanEvaluations` or `DocumentEvaluations` objects with your evaluation DataFrame. Example code snippet: ``` from phoenix.trace import SpanEvaluations import phoenix as px # Assuming `qa_eval_df` is your DataFrame with evaluation results px.Client().log_evaluations( SpanEvaluations(eval_name="Q&A Correctness", dataframe=qa_eval_df) ) ``` Would you like more details on any specific part of this process? Sources: - <https://github.com/Arize-ai/phoenix/issues/2309|https://github.com/Arize-ai/phoenix/issues/2309> - <https://github.com/arize-ai/phoenix/blob/main/tutorials/mistral/evaluate_rag--mistral.ipynb|https://github.com/arize-ai/phoenix/blob/main/tutorials/mistral/evaluate_rag--mistral.ipynb>
💡 Hint: Mention RunLLM in the thread for followups.

James S.

1. Prepare Evaluation Data: Ensure your evaluation results are in a DataFrame format, with necessary columns like context.span_id to associate evaluations with specific spans.

Where does the context.span_id come from? The dataset is:

N_EVAL_SAMPLE_SIZE = 3

df = download_benchmark_dataset(
    task="qa-classification", dataset_name="qa_generated_dataset"
)

df_sample = (
    df.sample(n=N_EVAL_SAMPLE_SIZE)
    .reset_index(drop=True)
    .rename(
        columns={
            "question": "input",
            "context": "reference",
            "sampled_answer": "output",
        }
    )
)

James S.
·
Found I asked a similar question here: https://arize-ai.slack.com/archives/C04R3GXC8HK/p1738917892261619?thread_ts=1737099011.156239&cid=C04R3GXC8HK This one can be ignored.
👍1

RunLLM
APP
·
To send evaluation results back to Phoenix, you can use the `log_evaluations` method of the `px.Client` class. This method allows you to log evaluation results, such as those from a Q&A evaluation, back to Phoenix for visualization. Here's a brief overview of the process: 1. *Prepare Evaluation Data*: Ensure your evaluation results are in a DataFrame format, with necessary columns like `context.span_id` to associate evaluations with specific spans. 2. *Log Evaluations*: Use the `log_evaluations` method to send the evaluation data to Phoenix. You need to create `SpanEvaluations` or `DocumentEvaluations` objects with your evaluation DataFrame. Example code snippet: ``` from phoenix.trace import SpanEvaluations import phoenix as px # Assuming `qa_eval_df` is your DataFrame with evaluation results px.Client().log_evaluations( SpanEvaluations(eval_name="Q&A Correctness", dataframe=qa_eval_df) ) ``` Would you like more details on any specific part of this process? Sources: - <https://github.com/Arize-ai/phoenix/issues/2309|https://github.com/Arize-ai/phoenix/issues/2309> - <https://github.com/arize-ai/phoenix/blob/main/tutorials/mistral/evaluate_rag--mistral.ipynb|https://github.com/arize-ai/phoenix/blob/main/tutorials/mistral/evaluate_rag--mistral.ipynb>
💡 Hint: Mention RunLLM in the thread for followups.

James S.

1. Prepare Evaluation Data: Ensure your evaluation results are in a DataFrame format, with necessary columns like context.span_id to associate evaluations with specific spans.

Where does the context.span_id come from? The dataset is:

N_EVAL_SAMPLE_SIZE = 3

df = download_benchmark_dataset(
    task="qa-classification", dataset_name="qa_generated_dataset"
)

df_sample = (
    df.sample(n=N_EVAL_SAMPLE_SIZE)
    .reset_index(drop=True)
    .rename(
        columns={
            "question": "input",
            "context": "reference",
            "sampled_answer": "output",
        }
    )
)

James S.
·
Found I asked a similar question here: https://arize-ai.slack.com/archives/C04R3GXC8HK/p1738917892261619?thread_ts=1737099011.156239&cid=C04R3GXC8HK This one can be ignored.
👍1