Setting Up Custom Evaluators for Prompt Output Comparison

Stuti P. · 2025-03-19T21:09:46.336Z

HI! I want to set up some custom evaluators to compare the outputs of different prompt versions. How can set up evaluators in the the UI itself? I'm using golden datasets for this. Thanks!

To set up custom evaluators in Phoenix, you can use the Prompt Playground to test and compare different prompt versions. The Playground allows you to run prompts over datasets and evaluate their performance. You can create custom evaluators by writing Python functions that return a boolean or numeric value, which will be recorded as the evaluation score. These evaluators can be used to assess the outputs of different prompt versions. For more detailed guidance, you can refer to the <https://docs.arize.com/phoenix/datasets-and-experiments/how-to-experiments/using-evaluators|Phoenix documentation on using evaluators> and <https://docs.arize.com/phoenix/prompt-engineering/how-to-prompts/test-a-prompt#S%3A3|testing prompts in the Playground>. Would you like more detail on setting up these evaluators or using the Playground? Sources: - <https://docs.arize.com/phoenix/prompt-engineering/how-to-prompts/test-a-prompt#S%3A3|Testing a Prompt in the Playground> - <https://docs.arize.com/phoenix/datasets-and-experiments/how-to-experiments/using-evaluators|Phoenix documentation on using evaluators>

💡 Hint: Mention RunLLM in the thread for followups.

Setting Up Custom Evaluators for Prompt Output Comparison

4 comments

Setting Up Custom Evaluators for Prompt Output Comparison

4 comments