Hi folks! 👋 I'm currently exploring the feasibility of running evals using Phoenix — specifically trying to understand if Phoenix can support a centralized, Arize-like interface rather than requiring local SDK-based runs on individual machines. From what we’ve gathered so far:
The current Phoenix architecture seems to treat the UI purely as a visual layer, with all evaluation logic needing to be run locally via the Python SDK (e.g., pip install arize-phoenix).
The typical flow involves exporting trace data from Phoenix, running the eval logic (heuristics, LLM-as-a-judge, etc.) locally, and then pushing the annotated results back to Phoenix so they appear in the UI.
We’re referring to this notebook as our base setup: evals_quickstart.ipynb
Our main question: 👉 Is there any way to run these evaluations centrally or on a server, rather than requiring every team member to do this manually via the SDK on their own machine? If not currently supported, are there any best practices or community-led patterns for scaling this workflow in a team setting? Appreciate any clarification or guidance from folks who’ve tackled this before!
Nafeea it seems like you are asking for online Evals, jobs run on a central server, as trace data or dataset data is ingested. We support this today in the Arize Ax platform not on Phoenix yet. If you are looking for SaaS or are a business / enterprise team (Ax supports self host), the Ax is a solid market leading option to check out. If you are passionate about open source and want to self host OSS, Phoenix will get online Evals it’s just a bit out in timing on our roadmap.
