To set up custom evaluators in Phoenix, you can use the Prompt Playground to test and compare different prompt versions. The Playground allows you to run prompts over datasets and evaluate their performance. You can create custom evaluators by writing Python functions that return a boolean or numeric value, which will be recorded as the evaluation score. These evaluators can be used to assess the outputs of different prompt versions.
For more detailed guidance, you can refer to the <https://docs.arize.com/phoenix/datasets-and-experiments/how-to-experiments/using-evaluators|Phoenix documentation on using evaluators> and <https://docs.arize.com/phoenix/prompt-engineering/how-to-prompts/test-a-prompt#S%3A3|testing prompts in the Playground>. Would you like more detail on setting up these evaluators or using the Playground?
Sources:
- <https://docs.arize.com/phoenix/prompt-engineering/how-to-prompts/test-a-prompt#S%3A3|Testing a Prompt in the Playground>
- <https://docs.arize.com/phoenix/datasets-and-experiments/how-to-experiments/using-evaluators|Phoenix documentation on using evaluators>