Evaluating Multi-Agentic Systems Built with Agno Framework

·Jun 19, 2025 06:56 PM

Hi Team, we have multiple multi-agentic systems built using the Agno framework. We want to evaluate them. How can we evaluate these multi-agentic chatbots or systems? Is there any support for this in Arize or Phoenix? For simple RAG, we evaluate based on the question, answer, and context. But how should we evaluate multi-agentic systems? What metrics are relevant here—especially for Agno agents? Please guide us?

1 comment