Post on Jul 16, 2025

4 comments

· Sorted by Oldest

Ben M.
·
Nafeea - We are doing this via API/CICD pipelines. The notebooks are there for demonstration, but there shouldn’t be anything preventing you from running these evaluations separately 🙂
2
RunLLM
APP
·
Phoenix does not natively support running evaluations centrally from the UI; all evaluation logic (including LLM-as-a-judge or heuristics) must be executed via the Python SDK or scripts, with results then logged back to Phoenix for visualization. The UI is strictly a visualization and annotation layer, not an execution environment for evals or batch jobs—this is confirmed in the <https://github.com/Arize-ai/phoenix/blob/main/tutorials/evals/evals_quickstart.ipynb|evals_quickstart notebook> and <https://arize.com/docs/phoenix/evaluation/how-to-evals/online-evals|official docs>. For centralized or team-scale workflows, the recommended pattern is to set up a shared server (or scheduled job, e.g., with cron) that periodically pulls new traces, runs evaluations, and logs results back to Phoenix—see the <https://arize.com/docs/phoenix/evaluation/how-to-evals/online-evals|online evals guide> and <https://github.com/Arize-ai/phoenix/blob/main/docs/evaluation/how-to-evals/online-evals.md|example script>. This allows teams to automate and centralize evaluation runs, avoiding the need for each user to run SDK code locally. Would you like more detail on how to set up such a centralized evaluation workflow? Sources: - <https://arize.com/docs/phoenix/evaluation/how-to-evals/online-evals|Online Evals | Phoenix Docs> - <https://github.com/Arize-ai/phoenix/blob/main/docs/evaluation/how-to-evals/online-evals.md|Online Evals Example Script> - <https://github.com/Arize-ai/phoenix/blob/main/tutorials/evals/evals_quickstart.ipynb|evals_quickstart notebook>
💡 Hint: Mention RunLLM in the thread for followups.
Ben M.
·
It’s not built into phoenix, but having this in a separate service isn’t terrible
Mikyo
·
Thanks Ben M. for the comment. In general Phoenix right now doesn't run evals as part of a job, it relies on you to pull data and run evaluations on a regular cadence. We will add this level of online evaluation but we want to iron out all our evals first. If you are looking for evals as a managed service, you can try out Arize Ax, which does have job infrastructure.
🙌1

Ben M.
·
Nafeea - We are doing this via API/CICD pipelines. The notebooks are there for demonstration, but there shouldn’t be anything preventing you from running these evaluations separately 🙂
2
RunLLM
APP
·
Phoenix does not natively support running evaluations centrally from the UI; all evaluation logic (including LLM-as-a-judge or heuristics) must be executed via the Python SDK or scripts, with results then logged back to Phoenix for visualization. The UI is strictly a visualization and annotation layer, not an execution environment for evals or batch jobs—this is confirmed in the <https://github.com/Arize-ai/phoenix/blob/main/tutorials/evals/evals_quickstart.ipynb|evals_quickstart notebook> and <https://arize.com/docs/phoenix/evaluation/how-to-evals/online-evals|official docs>. For centralized or team-scale workflows, the recommended pattern is to set up a shared server (or scheduled job, e.g., with cron) that periodically pulls new traces, runs evaluations, and logs results back to Phoenix—see the <https://arize.com/docs/phoenix/evaluation/how-to-evals/online-evals|online evals guide> and <https://github.com/Arize-ai/phoenix/blob/main/docs/evaluation/how-to-evals/online-evals.md|example script>. This allows teams to automate and centralize evaluation runs, avoiding the need for each user to run SDK code locally. Would you like more detail on how to set up such a centralized evaluation workflow? Sources: - <https://arize.com/docs/phoenix/evaluation/how-to-evals/online-evals|Online Evals | Phoenix Docs> - <https://github.com/Arize-ai/phoenix/blob/main/docs/evaluation/how-to-evals/online-evals.md|Online Evals Example Script> - <https://github.com/Arize-ai/phoenix/blob/main/tutorials/evals/evals_quickstart.ipynb|evals_quickstart notebook>
💡 Hint: Mention RunLLM in the thread for followups.
Ben M.
·
It’s not built into phoenix, but having this in a separate service isn’t terrible
Mikyo
·
Thanks Ben M. for the comment. In general Phoenix right now doesn't run evals as part of a job, it relies on you to pull data and run evaluations on a regular cadence. We will add this level of online evaluation but we want to iron out all our evals first. If you are looking for evals as a managed service, you can try out Arize Ax, which does have job infrastructure.
🙌1