Nafeea

Commented on Exploring Centralized Evaluations for Phoenix Fore...·Posted inIntroductions

·

Hi Dylan! Thank you very much for your prompt response and for the clarification about Phoenix's evaluation capabilities. The cron job automation approach sounds useful and I'll definitely check out that example and see how we can integrate it into our workflow. Appreciate the pointer to Phoenix Support as well - I'll reach out there if I run into any specific implementation questions. Thanks again for the help!

1

Posted in Discussions·

Nafeea

·

Exploring Centralized Evaluation Runs with Phoenix: Need Insights

Hi folks! 👋 I'm currently exploring the feasibility of running evals using Phoenix — specifically trying to understand if Phoenix can support a centralized, Arize-like interface rather than requiring local SDK-based runs on individual machines. From what we’ve gathered so far:

The current Phoenix architecture seems to treat the UI purely as a visual layer, with all evaluation logic needing to be run locally via the Python SDK (e.g., pip install arize-phoenix).
The typical flow involves exporting trace data from Phoenix, running the eval logic (heuristics, LLM-as-a-judge, etc.) locally, and then pushing the annotated results back to Phoenix so they appear in the UI.
We’re referring to this notebook as our base setup: evals_quickstart.ipynb

Our main question: 👉 Is there any way to run these evaluations centrally or on a server, rather than requiring every team member to do this manually via the SDK on their own machine? If not currently supported, are there any best practices or community-led patterns for scaling this workflow in a team setting? Appreciate any clarification or guidance from folks who’ve tackled this before!

1Comment

Posted in Introductions·

Nafeea

·

Exploring Centralized Evaluations for Phoenix Forecasting System

Hey team! 👋 We’re exploring the feasibility of running evaluations for our forecasting system using Phoenix — specifically trying to understand if Phoenix can support a centralized, Arize-like interface rather than requiring local SDK-based runs on individual machines. From what we’ve gathered so far:

The current Phoenix architecture seems to treat the UI purely as a visual layer, with all evaluation logic needing to be run locally via the Python SDK (e.g., pip install arize-phoenix).
The typical flow involves exporting trace data from Phoenix, running our evaluation logic (heuristics, LLM-as-a-judge, etc.) locally, and then pushing the annotated results back to Phoenix so they appear in the UI.
We’re referring to this notebook as our base setup: evals_quickstart.ipynb

Our main question: 👉 Is there any way to run these evaluations centrally or on a server, rather than requiring every team member to do this manually via the SDK on their own machine? If not currently supported, are there any best practices or community-led patterns for scaling this workflow in a team setting? Appreciate any clarification or guidance from folks who’ve tackled this before!

1Comment

Posted in Arize News·

Nafeea

·

Exploring Centralized Evaluations for Phoenix Forecasting System

Hey team! 👋 We’re exploring the feasibility of running evaluations for our forecasting system using Phoenix — specifically trying to understand if Phoenix can support a centralized, Arize-like interface rather than requiring local SDK-based runs on individual machines. From what we’ve gathered so far:

The current Phoenix architecture seems to treat the UI purely as a visual layer, with all evaluation logic needing to be run locally via the Python SDK (e.g., pip install arize-phoenix).
The typical flow involves exporting trace data from Phoenix, running our evaluation logic (heuristics, LLM-as-a-judge, etc.) locally, and then pushing the annotated results back to Phoenix so they appear in the UI.
We’re referring to this notebook as our base setup: evals_quickstart.ipynb

Our main question: 👉 Is there any way to run these evaluations centrally or on a server, rather than requiring every team member to do this manually via the SDK on their own machine? If not currently supported, are there any best practices or community-led patterns for scaling this workflow in a team setting? Appreciate any clarification or guidance from folks who’ve tackled this before!

0Comments