๐ New drop for everyone building agents: Session-Level Evaluations
You can now see how your agent does across a full convo, not just one turn.
What you can measure:
๐ Coherence (is it consistent?)
๐งฉ Context retention (is it remembering past turns?)
๐ฏ Goal achievement (does the user get what they came for?)
๐ค๏ธ Multi-step progression (can it handle complex tasks smoothly?)
Perfect for those of you building multi-turn workflows where step-by-step checks arenโt enough.
Full Guide: https://arize.com/docs/ax/cookbooks/evaluation/session-level-evaluations
Drop your questions or what youโre excited to test with this! ๐โจ