How to Evaluate Tool-Calling Agents with Phoenix LLM Evaluators: Demo, Blog, and Cookbook Tutorial

·Mar 03, 2026 06:55 PM·

We have a new demo video + blog + cookbook on how to evaluate tool-calling agents. Phoenix includes two prebuilt LLM-as-a-judge evaluators specifically for this — plus a full evaluation workflow in the UI that lets you write prompts, run experiments, add evaluators, and compare results without writing any code. This new tutorial + companion notebook walks through the full workflow using a travel assistant demo: what the evaluators measure, how to validate alignment, and how to use the results to improve both your assistant prompt and your evaluators. https://arize.com/blog/how-to-evaluate-tool-calling-agents/

🔥6

🔧3