Hi, I've been running some experiments on Phoenix, and things have been going well. I'm now looking into best practices for incorporating human evaluation for a specific metric. Is it possible for a human reviewer to log into the Phoenix platform and directly add their evaluations there? Or would it be better to collect the evaluations through another tool and then import the results into Phoenix? Would love to hear your suggestions on this matter 馃槂
Yes, human reviewers can directly add evaluations in Phoenix. Phoenix supports human annotations through its UI, allowing users to add, edit, view, and delete annotations on traces and spans. This can be useful for manual quality assurance and sharing insights within a team. According to Annotating Traces, Phoenix provides a simple mechanism to collect user feedback, such as thumbs-up or thumbs-down buttons, which can be logged as annotations.Alternatively, you can collect evaluations through another tool and import the results into Phoenix. This can be done by creating a DataFrame with the necessary evaluation data and using the log_evaluations method to upload it to Phoenix. This approach is useful if you have a large volume of evaluations or if you prefer using external tools for data collection. For more details, see Evaluating Phoenix Traces.Would you like more detail on either approach?Sources:
- Annotating Traces
- Evaluating Phoenix Traces
馃挕聽Hint: Mention RunLLM in the thread for followups.
But when i annotate the traces in Phoenix it doesnt show up in the evaluators columns like for example num_tokens and semtaic similarity score. Is there a way this human annotation can also show up?
Got you! When will the project you are referring to be production ready?
