Hey team, 👋
I am currently exploring Phoenix for tracing and evaluations. While the built-in eval templates are helpful, our use case requires more complex logic than what the current templates or the custom evaluator support (categorical/numerical) allows. Specifically, we want to define our own structured evaluation logic (e.g., diffing nested JSONs), and ideally log these results fully integrated into Phoenix, meaning visible in the UI, linked to traces, and treated just like native evaluation results.
Has anyone tried bypassing the default evaluator path to implement this kind of custom logic? Would love to hear how others have approached more advanced or structured judgment tasks.
Thanks in advance!