The likely cause of evaluations not showing up on each example in your dataset could be related to how the evaluations are logged or displayed. According to <https://docs.arize.com/phoenix/evaluation/how-to-evals/bring-your-own-evaluator#%3AR6cd9uucqfkvfa%3A|Phoenix documentation>, when logging evaluations, ensure that your `test_results` dataframe includes a `context.span_id` column with the corresponding span ID. This is crucial for the evaluations to be associated with the correct examples in Phoenix.
Additionally, if you are manually constructing dataframes for evaluations, ensure that they are compatible with Phoenix's logging methods. The <https://github.com/Arize-ai/phoenix/issues/6525|log_evaluations method> is intended for dataframes computed via evaluation helpers like `run_evals`. If the evaluations are not appearing as expected, verify that the data structure and logging process align with Phoenix's requirements.
Would you like more detail on how to ensure your evaluations are correctly logged and displayed?
Sources:
- <https://github.com/Arize-ai/phoenix/issues/6525|GitHub Issue #6525>
- <https://docs.arize.com/phoenix/evaluation/how-to-evals/bring-your-own-evaluator#%3AR6cd9uucqfkvfa%3A|Phoenix documentation>