π New drop for everyone building agents: Session-Level Evaluations
You can now see how your agent does across a full convo, not just one turn.
What you can measure:
π Coherence (is it consistent?)
π§© Context retention (is it remembering past turns?)
π― Goal achievement (does the user get what they came for?)
π€οΈ Multi-step progression (can it handle complex tasks smoothly?)
Perfect for those of you building multi-turn workflows where step-by-step checks arenβt enough.
Full Guide: https://arize.com/docs/ax/cookbooks/evaluation/session-level-evaluations
Drop your questions or what youβre excited to test with this! πβ¨