Hi, I really love the Playground feature where you can compare different prompt versions, exchange models etc.. From what I've seen so far, though, the prompt in use is always static (particularly, you can create variables and inject them into the prompt), but there is no way to integrate e.g. dynamic few shot examples (meaning: having a retrieval system in place that depending on the input retrieves semantically similar examples and injects the best candidates into the prompt) is it? Thanks for a short reply! 馃檹
It seems likely that Phoenix's Prompt Playground allows for the creation and testing of prompt templates with variables, which can be filled manually or via datasets, but does not explicitly mention dynamic retrieval systems for few-shot examples. According to the Prompt Playground documentation, you can use template languages like mustache or f-string to create prompt variables, but the integration of a dynamic retrieval system for few-shot examples is not explicitly covered.Would you like more detail on how to set up and use prompt templates in the Playground? Sources: - Prompt Playground documentation
馃挕聽Hint: Mention RunLLM in the thread for followups.
We currently only support dynamic prompt content via the methods described by runllm above; Variable injection that you fill in by typing in inputs in the playground UI, or, variables that are filled by dataset values. I would love to hear your use cases for data retrieval/dynamic prompting in playground, feel free to share them here and I can file an enhancement ticket for the team to go over
Anthony P. Thanks for your reply. The use case is information extraction. Depending on the input text I would like to inject those few-shot examples that are close in terms of semantic similarity, so that the model can learn from examples that show successful extractions of similar input texts. Does this make sense?
Got it, it does. Do you have specific data sources or retrieval mechanisms in mind?
No nothing specific. In fact I perform a hybrid search at the moment (bm25, embeddings), but does the type of retrieval really matter? Actually wouldn't it be easier to provide access to a function (and then the user can decide what this function returns). Or do you have any ideas in mind?
Gathering data on what kinds of retrieval users would like informs us as to whether we expose custom code execution to users via the UI, or build some first class integration with UI based configuration. It can be tough to balance whether we add something service specific or totally custom. Mikyo do you have an opinion here? I don't think we have plans to support retrieval in the near term within playground.
The issue with a potential first class integration that I see is that you not only have to take care of the retrieval but you also have to come up with an option that allows the user to perform the indexing as well (as opposed to the option to execute custom code where this would be passed towards the user). Curious to hear further thoughts on this 馃檪
Hey Daniel, your use-case makes a ton of sense but it is a bit tricky to add dynamic elements into the playground. It's kinda why we also have the ability to run prompt experiments via code so you can use things like DSPy or MIPRO to optimize the few shot examples in code. Right now the playground can perform sweeps over datasets which can be leveraged to do "sweeps" on certain inputs which could be used in some capacity. I think at some point we probably would integrate some sort of API support for things like retrievers but right now we are heavily investing in that. It will be interesting if things like OpenAI's files api or Anthropic's MCP becomes the defacto as that could be probably leveraged more easily in a standard manner. But I hear ya - bootstrapping few shot examples in the UI would be amazing. Just a bit easier to do it in code right now.
Thanks for sharing your thoughts 馃檹
Mikyo Anthony P. langsmith supports this and is very useful, see: https://docs.smith.langchain.com/evaluation/how_to_guides/index_datasets_for_dynamic_few_shot_example_selection https://docs.smith.langchain.com/evaluation/how_to_guides/create_few_shot_evaluators https://blog.langchain.dev/aligning-llm-as-a-judge-with-human-preferences/
they also have this example that is not a feature just a notebook, but you could have something similar in the docs / notebooks: https://docs.smith.langchain.com/prompt_engineering/tutorials/optimize_classifier
Few-shot examples are added to your evaluator prompt using the {{Few-shot examples}} variable
Creating an evaluator with few-shot examples, will automatically create a dataset for you, which will be auto-populated with few-shot examples once you start making corrections
At runtime, these examples will inserted into the evaluator to serve as a guide for its outputs - this will help the evaluator to better align with human preferences
the few shot search in the sdk is also cool, built-in similarity search on the datasets with a "similar_examples" function: examples = await client.similar_examples( {"question": "knock knock"}, dataset_id=dataset_id, limit=1 )
any plans to add this to phoenix especially the evaluator prompt dynamic variable?
Tiago F. that is a very nice feature! I鈥檒l definitely file an issue and think through it a bit more. I think the only thing that is tricky is that we don鈥檛 have similarly indexes created for dataset examples. Can you tell me more about how you would use this in templates for evaluation? I think I get it but want to make sure. I鈥檇 like to file a GitHub issue to see if the community can chime in on different use cases and solutions
