Using Reference Outputs in Prompt Templates for RunLLM

Andrew S. · 2026-01-10T21:51:42.895Z

@RunLLM is it possible to reference the reference output in the prompt playground within my prompt templates?

Using Reference Outputs in Prompt Templates for RunLLM | Arize AI Community

12 comments

· Sorted by Oldest

RunLLM
APP
·
·
Yes, you can reference the reference output in the Prompt Playground by using a variable in your prompt template that matches the column name for the reference output in your dataset (commonly {{reference}}). When you run a prompt over a dataset, Phoenix fills in these template variables—including the reference output—for each example at runtime, allowing you to use the reference text directly in your prompt template for tasks like RAG or Q&A evaluation. To do this, ensure your dataset includes a column for the reference output (e.g., reference), and use {{reference}} in your prompt template. When the prompt is executed, Phoenix will substitute the value from the dataset for each row. For example: LBKS[Question]: {{input}} [Reference text]: {{reference}}LBKSAccording to Phoenix documentation and Prompt Concepts, this is the standard way to dynamically inject reference outputs into your prompts.Would you like more detail or a code example?Sources: - Phoenix documentation - Prompt Concepts
💡 Hint: Mention RunLLM in the thread for followups.
Andrew S.
·
·
RunLLM i see this error: Dataset is missing input for variable: reference. Possible inputs are: input
RunLLM
APP
·
·
The error means your dataset does not have a column named reference, but your prompt template uses {{reference}}. Only the input column is available, so Phoenix cannot fill the {{reference}} variable in your prompt. To fix this, either add a reference column to your dataset or remove {{reference}} from your prompt template. For more, see Phoenix Playground error handling and Prompt Concepts.Would you like step-by-step instructions to resolve this?Sources: - Phoenix Playground error handling - Prompt Concepts
💡 Hint: Mention RunLLM in the thread for followups.
Andrew S.
·
·
RunLLM in my case, I want to use the prompt playground to run a prompt over a dataset of traces of the retriever type. my retrieved documents are reference outputs. therefor, I want the LLM to see these outputs and judge the retrieved results. that’s why I need to be able to reference the reference outputs (not just inputs) in my prompt. does that make sense? or am I misusing phoenix?
RunLLM
APP
·
·
Your use case makes sense and aligns with Phoenix's intended RAG evaluation workflows: referencing retrieved documents (reference outputs) in prompts to let the LLM judge their relevance. However, in the Prompt Playground, only columns explicitly present in your dataset can be referenced as template variables (e.g., {{reference}}). If your dataset of retriever traces does not include a reference (or similar) column, Phoenix cannot inject those values into the prompt—this is why you see the missing variable error.To achieve your goal, ensure your dataset includes a column (e.g., reference or retrieved_docs) containing the retrieved documents for each trace. Then, reference that column in your prompt template (e.g., {{reference}}). This approach is standard for RAG and retrieval evaluation in Phoenix, as described in Prompt Concepts and Testing Prompts at Scale.Would you like detailed steps or a code example for preparing your dataset?Sources: - Prompt Concepts - Testing Prompts at Scale
💡 Hint: Mention RunLLM in the thread for followups.
Andrew S.
·
·
RunLLM my dataset does include a reference column but the only possible variable to use is input . is this a bug? are there any issues or discussion about this?
RunLLM
APP
·
·
This is a known issue: even if your dataset includes a reference column, the Prompt Playground may only expose input as a usable variable if the dataset was created or loaded in a way that doesn't map the reference output as an input variable. This limitation and related confusion have been discussed in GitHub issues and release notes, with improvements to error handling and variable mapping released in recent Phoenix versions (see here, issue discussion).To resolve this, ensure your dataset's reference column is mapped as an input variable when uploading or creating the dataset, not just as an output. If the Playground still doesn't recognize it, this is a UI limitation or bug—recent updates have improved error messaging, but the underlying mapping issue may persist in some cases.Would you like guidance on how to structure your dataset or a workaround?Sources: - see here - issue discussion
💡 Hint: Mention RunLLM in the thread for followups.
Andrew S.
·
·
RunLLM this is not great because I cannot simply add traces to a dataset and then immediately use them in the prompt playground
RunLLM
APP
·
·
You are correct—currently, when you add traces to a dataset, Phoenix does not automatically map reference outputs (e.g., retrieved documents) as input variables for prompt templates in the Prompt Playground. This limitation means you cannot immediately use those reference outputs in your prompts without additional data transformation or remapping, which has been noted as a workflow friction point in user discussions (see here).This is a known limitation, and there are open enhancement requests to improve trace-to-dataset workflows and variable mapping for seamless prompt experimentation.Would you like suggestions for workarounds or updates on planned improvements?Sources: - see here
💡 Hint: Mention RunLLM in the thread for followups.
Mikyo
·
·
Hey Andrew, your request makes a lot of sense. We tend to think of the reference as the area where ground truth is so it's where the evaluators will eventually pull data from for reference based evaluations. But I think making it possible to "chat" with your spans like this makes sense too. I filed an issue. Let me think through the design. It's sorta like you want the prompt variables to be scoped to the entire example so you can do input / output / metadata in your prompt. https://github.com/Arize-ai/phoenix/issues/10912
Andrew S.
·
·
exactly! that would be amazing. It would dramatically improve the user experience for non-technical domain experts. My goal is to enable them to work comfortably and effectively entirely within Phoenix without needing developers in the loop. As it stands, there is little they can do in Phoenix without relying on developers to run scripts (experiments and processing traces to populate datasets). I want them to be able to use our application then easily find the relevant traces within Phoenix, play with them in the prompt playground (i.e., running ad-hoc LLM judge prompts against traces), annotate traces, and add examples to datasets. Right now this workflow is high friction. thanks for considering 🙏
Mikyo
·
·
Makes too much sense. Will get something cooking

RunLLM
APP
·
·
Yes, you can reference the reference output in the Prompt Playground by using a variable in your prompt template that matches the column name for the reference output in your dataset (commonly {{reference}}). When you run a prompt over a dataset, Phoenix fills in these template variables—including the reference output—for each example at runtime, allowing you to use the reference text directly in your prompt template for tasks like RAG or Q&A evaluation. To do this, ensure your dataset includes a column for the reference output (e.g., reference), and use {{reference}} in your prompt template. When the prompt is executed, Phoenix will substitute the value from the dataset for each row. For example: LBKS[Question]: {{input}} [Reference text]: {{reference}}LBKSAccording to Phoenix documentation and Prompt Concepts, this is the standard way to dynamically inject reference outputs into your prompts.Would you like more detail or a code example?Sources: - Phoenix documentation - Prompt Concepts
💡 Hint: Mention RunLLM in the thread for followups.
Andrew S.
·
·
RunLLM i see this error: Dataset is missing input for variable: reference. Possible inputs are: input
RunLLM
APP
·
·
The error means your dataset does not have a column named reference, but your prompt template uses {{reference}}. Only the input column is available, so Phoenix cannot fill the {{reference}} variable in your prompt. To fix this, either add a reference column to your dataset or remove {{reference}} from your prompt template. For more, see Phoenix Playground error handling and Prompt Concepts.Would you like step-by-step instructions to resolve this?Sources: - Phoenix Playground error handling - Prompt Concepts
💡 Hint: Mention RunLLM in the thread for followups.
Andrew S.
·
·
RunLLM in my case, I want to use the prompt playground to run a prompt over a dataset of traces of the retriever type. my retrieved documents are reference outputs. therefor, I want the LLM to see these outputs and judge the retrieved results. that’s why I need to be able to reference the reference outputs (not just inputs) in my prompt. does that make sense? or am I misusing phoenix?
RunLLM
APP
·
·
Your use case makes sense and aligns with Phoenix's intended RAG evaluation workflows: referencing retrieved documents (reference outputs) in prompts to let the LLM judge their relevance. However, in the Prompt Playground, only columns explicitly present in your dataset can be referenced as template variables (e.g., {{reference}}). If your dataset of retriever traces does not include a reference (or similar) column, Phoenix cannot inject those values into the prompt—this is why you see the missing variable error.To achieve your goal, ensure your dataset includes a column (e.g., reference or retrieved_docs) containing the retrieved documents for each trace. Then, reference that column in your prompt template (e.g., {{reference}}). This approach is standard for RAG and retrieval evaluation in Phoenix, as described in Prompt Concepts and Testing Prompts at Scale.Would you like detailed steps or a code example for preparing your dataset?Sources: - Prompt Concepts - Testing Prompts at Scale
💡 Hint: Mention RunLLM in the thread for followups.
Andrew S.
·
·
RunLLM my dataset does include a reference column but the only possible variable to use is input . is this a bug? are there any issues or discussion about this?
RunLLM
APP
·
·
This is a known issue: even if your dataset includes a reference column, the Prompt Playground may only expose input as a usable variable if the dataset was created or loaded in a way that doesn't map the reference output as an input variable. This limitation and related confusion have been discussed in GitHub issues and release notes, with improvements to error handling and variable mapping released in recent Phoenix versions (see here, issue discussion).To resolve this, ensure your dataset's reference column is mapped as an input variable when uploading or creating the dataset, not just as an output. If the Playground still doesn't recognize it, this is a UI limitation or bug—recent updates have improved error messaging, but the underlying mapping issue may persist in some cases.Would you like guidance on how to structure your dataset or a workaround?Sources: - see here - issue discussion
💡 Hint: Mention RunLLM in the thread for followups.
Andrew S.
·
·
RunLLM this is not great because I cannot simply add traces to a dataset and then immediately use them in the prompt playground
RunLLM
APP
·
·
You are correct—currently, when you add traces to a dataset, Phoenix does not automatically map reference outputs (e.g., retrieved documents) as input variables for prompt templates in the Prompt Playground. This limitation means you cannot immediately use those reference outputs in your prompts without additional data transformation or remapping, which has been noted as a workflow friction point in user discussions (see here).This is a known limitation, and there are open enhancement requests to improve trace-to-dataset workflows and variable mapping for seamless prompt experimentation.Would you like suggestions for workarounds or updates on planned improvements?Sources: - see here
💡 Hint: Mention RunLLM in the thread for followups.
Mikyo
·
·
Hey Andrew, your request makes a lot of sense. We tend to think of the reference as the area where ground truth is so it's where the evaluators will eventually pull data from for reference based evaluations. But I think making it possible to "chat" with your spans like this makes sense too. I filed an issue. Let me think through the design. It's sorta like you want the prompt variables to be scoped to the entire example so you can do input / output / metadata in your prompt. https://github.com/Arize-ai/phoenix/issues/10912
Andrew S.
·
·
exactly! that would be amazing. It would dramatically improve the user experience for non-technical domain experts. My goal is to enable them to work comfortably and effectively entirely within Phoenix without needing developers in the loop. As it stands, there is little they can do in Phoenix without relying on developers to run scripts (experiments and processing traces to populate datasets). I want them to be able to use our application then easily find the relevant traces within Phoenix, play with them in the prompt playground (i.e., running ad-hoc LLM judge prompts against traces), annotate traces, and add examples to datasets. Right now this workflow is high friction. thanks for considering 🙏
Mikyo
·
·
Makes too much sense. Will get something cooking