Hi, does phoenix allow running experiment with multi-step QAs?
Yes, Phoenix supports running experiments with multi-step QAs, including agent workflows and multi-turn conversations. You can define a task function that handles multi-step reasoning or agent execution, and evaluators can assess outputs at each step or for the final result. Phoenix also supports agent path and convergence evaluation, allowing you to analyze the efficiency and correctness of multi-step agent behavior in experiments. See examples of agent and multi-step evaluation in the documentation and cookbooks [source1]聽[source2]聽[source3].Would you like more detail or a code example?Sources: - https://github.com/Arize-ai/phoenix/blob/main/tutorials/evals/evaluate_agent.ipynb - https://github.com/Arize-ai/phoenix/blob/main/docs/section-integrations/evaluation-integrations/ragas.md
馃挕聽Hint: Mention RunLLM in the thread for followups.
Hi Xander S., the simplest example is case
User: Hi
Agent: Hi
User: What is my name?
Agent: I dont know
User: My name is Ihor
Agent: Hi Ihor
User: What is my name
Agent: Your name is IhorHere is sequence of 4 request to Agent. I want to evaluate only the last one. When running this task it will create 4 spans, I believe. How to assign all for to one experiment datasample task run?
