Hi, does phoenix allow running experiment with multi-step QAs?

Igor B.

·Aug 26, 2025 12:22 PM

3 comments

· Sorted by Oldest

RunLLM
APP
·
Yes, Phoenix supports running experiments with multi-step QAs, including agent workflows and multi-turn conversations. You can define a task function that handles multi-step reasoning or agent execution, and evaluators can assess outputs at each step or for the final result. Phoenix also supports agent path and convergence evaluation, allowing you to analyze the efficiency and correctness of multi-step agent behavior in experiments. See examples of agent and multi-step evaluation in the documentation and cookbooks <https://github.com/Arize-ai/phoenix/blob/main/docs/section-cookbooks/evaluation/evaluate-an-agent.md|[source1]> <https://github.com/Arize-ai/phoenix/blob/main/tutorials/evals/evaluate_agent.ipynb|[source2]> <https://github.com/Arize-ai/phoenix/blob/main/docs/section-integrations/evaluation-integrations/ragas.md|[source3]>. Would you like more detail or a code example? Sources: - <https://github.com/Arize-ai/phoenix/blob/main/tutorials/evals/evaluate_agent.ipynb|https://github.com/Arize-ai/phoenix/blob/main/tutorials/evals/evaluate_agent.ipynb> - <https://github.com/Arize-ai/phoenix/blob/main/docs/section-integrations/evaluation-integrations/ragas.md|https://github.com/Arize-ai/phoenix/blob/main/docs/section-integrations/evaluation-integrations/ragas.md>
💡 Hint: Mention RunLLM in the thread for followups.
Igor B.
·
Hi Xander S., the simplest example is case
User: Hi Agent: Hi User: What is my name? Agent: I dont know User: My name is Ihor Agent: Hi Ihor User: What is my name Agent: Your name is Ihor
Here is sequence of 4 request to Agent. I want to evaluate only the last one. When running this task it will create 4 spans, I believe. How to assign all for to one experiment datasample task run?
Xander S.
·
Thanks Igor B.. When using run_experiment your evals are attached to the task run rather than to any individual span that might be instrumented within the task. So you can define your task with however many requests back and forth as you wish and evaluate as usual.
🙏1
1

RunLLM
APP
·
Yes, Phoenix supports running experiments with multi-step QAs, including agent workflows and multi-turn conversations. You can define a task function that handles multi-step reasoning or agent execution, and evaluators can assess outputs at each step or for the final result. Phoenix also supports agent path and convergence evaluation, allowing you to analyze the efficiency and correctness of multi-step agent behavior in experiments. See examples of agent and multi-step evaluation in the documentation and cookbooks <https://github.com/Arize-ai/phoenix/blob/main/docs/section-cookbooks/evaluation/evaluate-an-agent.md|[source1]> <https://github.com/Arize-ai/phoenix/blob/main/tutorials/evals/evaluate_agent.ipynb|[source2]> <https://github.com/Arize-ai/phoenix/blob/main/docs/section-integrations/evaluation-integrations/ragas.md|[source3]>. Would you like more detail or a code example? Sources: - <https://github.com/Arize-ai/phoenix/blob/main/tutorials/evals/evaluate_agent.ipynb|https://github.com/Arize-ai/phoenix/blob/main/tutorials/evals/evaluate_agent.ipynb> - <https://github.com/Arize-ai/phoenix/blob/main/docs/section-integrations/evaluation-integrations/ragas.md|https://github.com/Arize-ai/phoenix/blob/main/docs/section-integrations/evaluation-integrations/ragas.md>
💡 Hint: Mention RunLLM in the thread for followups.
Igor B.
·
Hi Xander S., the simplest example is case
User: Hi Agent: Hi User: What is my name? Agent: I dont know User: My name is Ihor Agent: Hi Ihor User: What is my name Agent: Your name is Ihor
Here is sequence of 4 request to Agent. I want to evaluate only the last one. When running this task it will create 4 spans, I believe. How to assign all for to one experiment datasample task run?
Xander S.
·
Thanks Igor B.. When using run_experiment your evals are attached to the task run rather than to any individual span that might be instrumented within the task. So you can define your task with however many requests back and forth as you wish and evaluate as usual.
🙏1
1