Hey, trying to use evals inside experiments in phoenix. Triggering experiment like first SS and the output of the eval looks like 2nd SS. Is there a way to add also add explanation for evaluations on experiment runs? If so, can anyone share an example?
馃挕聽Hint: Mention RunLLM in the thread for followups.
getting this error, when I return an evaluation response as defined by the robot
if you replace all your run_experiment imports with `from phoenix.client.experiments import run_experiment (and are on the latest version of the arize-phoenix-client this should work
cool, thanks. Am i no longer able to run as backround thread? This worked when using run-experiment with arize-phoenix. Also whats the difference between arize-phoenix and arize-phoenix-client?
Is this what happens when you try to Ctrl+C?
No, I am triggering an experiment run, which used to work fine. This is running as a background task, here is the function it is running
what's weird is it works in the background if I switch back to the other import:
The biggest difference between the two is we no longer try to start an event loop if possible in the run_experiment function, instead we provide run_experiment and async_run_experiment if you want to run with concurrency, as maintaining the single entry point was inflexible for many use cases and had a lot of edge cases
I'm not entirely sure why running in a thread isn't working for you, but if you need concurrency can you try using the async version instead?
for now, if you need the original run_experiment instead of returning a dictionary return this object:
from phoenix.client.resources.experiments.types import EvaluationResult
def evaluator(input, reference):
...
return EvaluationResult(
score=1.0,
label="true",
explanation="because I said so"
)I'll investigate why the new client version isn't working in a thread where the old one does
still no 馃槙 happy to hop on a call if it's easier for you
The error you're getting is because the evaluator is returning a dict, so I think you might be running a script that hasn't picked up your evaluator changes
the EvaluationResult object you should be returning (as shown in the screenshot) is a dataclass
