when comparing two experiment results, is there a way to filter rows where two experiment results don't match? the screenshot is from https://youtu.be/xPzlko_pkyg?si=EM7NsN731VNgeBu4&t=421, but more recent phoenix version also has a filter bar like attached, I wonder if it's possible to do such filtering in the filter bar.
Yes, Phoenix's experiment comparison UI supports a filter bar that accepts Python-like boolean expressions to filter rows. You can filter on experiment run attributes, including comparing values between experiments, such as checking if the outputs differ. For example, you can use a filter like experiments[0].output != experiments[1].output to show only rows where the outputs of the two experiments do not match. This syntax is supported in the filter bar for experiment comparison as of recent versions (test_experiment_run_filters.py, ExperimentCompareTable.tsx).Would you like more detail or example filter expressions?Sources:
- test_experiment_run_filters.py
- ExperimentCompareTable.tsx
馃挕聽Hint: Mention RunLLM in the thread for followups.
say my output for each row in a experiment is like
{"order_id":"185386615","customer_name":"NA"}i'd like to get the rows where order_id don't match between two experiments. how to do so?
or make it even simpler first, how to get rows where the output don't match overall?
I know how to do so if these results are in a pandas dataframe, but not sure if phoenix UI provides such flexibility.
You should be able to do something like
experiments[0].output["order_id"] != experiments[1].output["order_id"]that assumes the output is a dictionary? my output is a json tho
I see. experiments[0].output != experiments[1].output works, now I'd like to get to the field level comparison if possible.
experiments[0].output["order_id"] != experiments[1].output["order_id"]doesn't work, it returns no row and no complain about any syntax error either
that assumes the output is a dictionary? my output is a json tho
How are you logging your experiment results?
Maybe you can show me your experiment code.
actually, I can replace order_id with arbitrary key, e.g.
experiments[0].output['foo'] != experiments[1].output['foo']and it doesn't complain either and with no output.
Not sure I follow. Does that filter condition work?
the definition of task to run_experiment is like
def call_llm(example: Example) -> str:
formatted_prompt = prompt.format(variables={"user_msg": example.input["user_msg"]})
resp = llm_client.beta.chat.completions.parse(
messages=formatted_prompt.messages,
model=_MODEL,
temperature=0,
max_tokens=1024,
response_format=Foo,
)
return resp.choices[0].message.contentDoes that filter condition work?
no it doesn't, it doesn't complain or return anything, e.g.
