Online Evals in Arize AX refer to real-time, automated evaluations of LLM outputs as traces are ingested鈥攁llowing for immediate scoring, alerting, and monitoring in production. Phoenix (the open-source, self-hosted version) currently does not support true online evals; evaluations must be run in batches after traces are collected, typically via scheduled jobs or manual triggers, rather than automatically as data arrives. This means Phoenix is limited to "offline" or "batch" evaluations, while Arize AX provides continuous, production-grade monitoring with threshold-based alerts and dashboards for evaluation metrics like faithfulness or correctness as soon as new data is available.
If you need real-time, always-on evaluation and alerting for production LLM traffic, this is only available in Arize AX. For most experimentation, prompt engineering, and batch evaluation use cases, Phoenix is sufficient and much easier to self-host. For more detail, see <https://arize.com/docs/phoenix/resources/frequently-asked-questions/langfuse-alternative-arize-phoenix-vs-langfuse-key-differences#ease-of-self-hosting|Phoenix vs. Langfuse/Arize AX comparison> and <https://arize.com/docs/phoenix/user-guide#panel-iterate-via-experiments-0|Phoenix User Guide>.
Would you like a step-by-step breakdown of how evaluations work in Phoenix and how to set up batch evals?
Sources:
- <https://arize.com/docs/phoenix/resources/frequently-asked-questions/langfuse-alternative-arize-phoenix-vs-langfuse-key-differences#ease-of-self-hosting|Phoenix vs. Langfuse/Arize AX comparison>
- <https://arize.com/docs/phoenix/user-guide#panel-iterate-via-experiments-0|Phoenix User Guide>