Using Evals in Phoenix Experiments: Explanation & Example Needed

Ron C. · 2025-07-16T18:36:10.796Z

Hey, trying to use evals inside experiments in phoenix. Triggering experiment like first SS and the output of the eval looks like 2nd SS. Is there a way to add also add explanation for evaluations on experiment runs? If so, can anyone share an example?

RunLLM
APP
·
Yes, you can include explanations in evaluations for experiment runs in Phoenix. When defining your evaluator, return a dictionary with keys like label, score, and explanation, or use the EvaluationResult object to include an explanation. This allows explanations to be attached and displayed with each evaluation result in the experiment output. Example (Python): ``` from phoenix.experiments import run_experiment from phoenix.experiments.evaluators import create_evaluator @create_evaluator(name="custom_eval", kind="CODE") def my_eval(output, expected): matches = output == expected return { "label": "match" if matches else "no match", "score": 1 if matches else 0, "explanation": "Outputs match exactly." if matches else "Outputs differ." } experiment = run_experiment(dataset, task, evaluators=[my_eval]) ``` This will add explanations to each evaluation in your experiment runs. For more, see <https://arize.com/docs/phoenix/datasets-and-experiments/how-to-experiments/using-evaluators|Using Evaluators> and <https://github.com/Arize-ai/phoenix/blob/main/js/packages/phoenix-client/examples/evaluate_experiment.ts|GitHub example>. Would you like more detail or a step-by-step walkthrough? Sources: - <https://github.com/Arize-ai/phoenix/blob/main/js/packages/phoenix-client/examples/evaluate_experiment.ts|GitHub example: evaluate_experiment.ts> - <https://arize.com/docs/phoenix/datasets-and-experiments/how-to-experiments/using-evaluators|Using Evaluators>
💡 Hint: Mention RunLLM in the thread for followups.
👀1
Ron C.
·
getting this error, when I return an evaluation response as defined by the robot
Dustin N.
·
hi Ron C. that's the interface for the new run_experiment method inside the phoenix client, we've been migrating all of Phoenix's clientside functionality into a standalone thin client
Dustin N.
·
if you replace all your run_experiment imports with `from phoenix.client.experiments import run_experiment (and are on the latest version of the arize-phoenix-client this should work
Ron C.
·
cool, thanks. Am i no longer able to run as backround thread? This worked when using run-experiment with arize-phoenix. Also whats the difference between arize-phoenix and arize-phoenix-client?
Dustin N.
·
Is this what happens when you try to Ctrl+C?
Ron C.
·
No, I am triggering an experiment run, which used to work fine. This is running as a background task, here is the function it is running
Ron C.
·
what's weird is it works in the background if I switch back to the other import:
Dustin N.
·
The biggest difference between the two is we no longer try to start an event loop if possible in the run_experiment function, instead we provide run_experiment and async_run_experiment if you want to run with concurrency, as maintaining the single entry point was inflexible for many use cases and had a lot of edge cases
Dustin N.
·
I'm not entirely sure why running in a thread isn't working for you, but if you need concurrency can you try using the async version instead?

Dustin N.

for now, if you need the original run_experiment instead of returning a dictionary return this object:

from phoenix.client.resources.experiments.types import EvaluationResult

def evaluator(input, reference):
    ...
    return EvaluationResult(
        score=1.0,
        label="true",
        explanation="because I said so"
    )

Dustin N.
·
I'll investigate why the new client version isn't working in a thread where the old one does
Ron C.
·
still no 😕 happy to hop on a call if it's easier for you
Dustin N.
·
The error you're getting is because the evaluator is returning a dict, so I think you might be running a script that hasn't picked up your evaluator changes
Dustin N.
·
the EvaluationResult object you should be returning (as shown in the screenshot) is a dataclass

31 comments