Hi Dylan! Thank you very much for your prompt response and for the clarification about Phoenix's evaluation capabilities. The cron job automation approach sounds useful and I'll definitely check out that example and see how we can integrate it into our workflow. Appreciate the pointer to Phoenix Support as well - I'll reach out there if I run into any specific implementation questions. Thanks again for the help!
Hi folks! 👋 I'm currently exploring the feasibility of running evals using Phoenix — specifically trying to understand if Phoenix can support a centralized, Arize-like interface rather than requiring local SDK-based runs on individual machines. From what we’ve gathered so far:
The current Phoenix architecture seems to treat the UI purely as a visual layer, with all evaluation logic needing to be run locally via the Python SDK (e.g., pip install arize-phoenix).
The typical flow involves exporting trace data from Phoenix, running the eval logic (heuristics, LLM-as-a-judge, etc.) locally, and then pushing the annotated results back to Phoenix so they appear in the UI.
We’re referring to this notebook as our base setup: evals_quickstart.ipynb
Our main question: 👉 Is there any way to run these evaluations centrally or on a server, rather than requiring every team member to do this manually via the SDK on their own machine? If not currently supported, are there any best practices or community-led patterns for scaling this workflow in a team setting? Appreciate any clarification or guidance from folks who’ve tackled this before!
Hey team! 👋 We’re exploring the feasibility of running evaluations for our forecasting system using Phoenix — specifically trying to understand if Phoenix can support a centralized, Arize-like interface rather than requiring local SDK-based runs on individual machines. From what we’ve gathered so far:
The current Phoenix architecture seems to treat the UI purely as a visual layer, with all evaluation logic needing to be run locally via the Python SDK (e.g., pip install arize-phoenix).
The typical flow involves exporting trace data from Phoenix, running our evaluation logic (heuristics, LLM-as-a-judge, etc.) locally, and then pushing the annotated results back to Phoenix so they appear in the UI.
We’re referring to this notebook as our base setup: evals_quickstart.ipynb
Our main question: 👉 Is there any way to run these evaluations centrally or on a server, rather than requiring every team member to do this manually via the SDK on their own machine? If not currently supported, are there any best practices or community-led patterns for scaling this workflow in a team setting? Appreciate any clarification or guidance from folks who’ve tackled this before!
Hey team! 👋 We’re exploring the feasibility of running evaluations for our forecasting system using Phoenix — specifically trying to understand if Phoenix can support a centralized, Arize-like interface rather than requiring local SDK-based runs on individual machines. From what we’ve gathered so far:
The current Phoenix architecture seems to treat the UI purely as a visual layer, with all evaluation logic needing to be run locally via the Python SDK (e.g., pip install arize-phoenix).
The typical flow involves exporting trace data from Phoenix, running our evaluation logic (heuristics, LLM-as-a-judge, etc.) locally, and then pushing the annotated results back to Phoenix so they appear in the UI.
We’re referring to this notebook as our base setup: evals_quickstart.ipynb
Our main question: 👉 Is there any way to run these evaluations centrally or on a server, rather than requiring every team member to do this manually via the SDK on their own machine? If not currently supported, are there any best practices or community-led patterns for scaling this workflow in a team setting? Appreciate any clarification or guidance from folks who’ve tackled this before!
