Hi, I’m running experiments and log evaluation results to phoenix. Sometimes the feedback isn’t updated. How can I debug the problem? The code is here: https://gist.github.com/goofansu/d01464f95ead23695e8075c0020f4a61 arize-phoenix version: 8.6.1
To debug the issue of feedback not updating in Phoenix, consider the following steps: 1. Check Environment Variables: Ensure that the environment variables such as PHOENIX_COLLECTOR_ENDPOINT are correctly set to point to the Phoenix server. This is crucial for the communication between your application and Phoenix (source).2. Verify API Usage: Make sure you are using the correct API endpoints and methods for logging evaluations. The px.Client().log_evaluations() method should be used correctly with the appropriate dataframes (source).3. Check for Errors: Look at the logs of your application and the Phoenix server for any errors or warnings that might indicate why feedback is not being updated. This can provide insights into any issues with the data being sent or received (source).Would you like more detailed steps or specific code examples?Sources:
- https://arize-ai.slack.com/archives/C04R3GXC8HK/p1722778288.923159
- https://github.com/Arize-ai/phoenix/issues/6525
💡 Hint: Mention RunLLM in the thread for followups.
these evals are sent before the span is received by the server, so there’s a chance that they missed each other just long enough that the server has nowhere to put the evals
also, these are evals of experiment “evaluator”
so maybe there could be a better approach here since they are capturing the same info
i guess the output of the “evaluator” could be surfaced better in the UI
i think these “evaluator” spans shouldn’t need to be annotated (for a second time), since they’re mostly there for record keeping after running the evaluators. The output of the evaluator should have already been used to annotate the experiment runs themselves, which are the real subjects of evaluation.
maybe we should surface the output of the evaluators in the output column of the table. that would show the same info as what the screenshot is attempting to do
Dustin N. No worry, maybe I just use the wrong way to log feedback. Roger Y. I also think it’s not necessary to log_evaluations manually in the evaluator. But if I just run evaluation in experiments and without log_evaluations, there is no feedback in the evaluator traces. For feedback I mean the screenshot:
Roger Y. I changed to evaluate the AGENT spans instead of run experiment for my requirement.
Log traces of my agent for question, answer and reference.
run_evals against the AGENT kind spans and return evaluation results.
log_evaluations for the above evaluation results.
Now there are feedbacks with explanations in each span. Now my understanding is:
Experiment evaluations are used to track the overall benchmarks.
Span evaluations are used to see the detail of feedbacks.
Correct me if I’m wrong. Thanks
Yea i think that’s correct. Incidentally, the Feedback section in the UI also has another purpose: you can annotate manually via the “Annotate” button on the upper right hand side.
I see. That’s a probably a good convenience enhancement that we can add to smooth this process for you. We appreciate you bringing this to our attention.
can i ask what method do you use to download the root spans?
