Navigating Phoenix's Evolving Documentation: Key Questions Addressed | Arize AI Community

Arize AI Community Icon

Navigating Phoenix's Evolving Documentation: Key Questions Addressed | Arize AI Community

RunLLM
APP
·
·
To use LiteLLM with a custom model or endpoint in Phoenix Evals v2, instantiate LiteLLMModel and pass model-specific parameters via the model_kwargs argument. For custom endpoints (e.g., Anthropic, Ollama, or your own proxy), set the required environment variables or include endpoint details in model_kwargs as needed by LiteLLM. Example: LBKSimport os from phoenix.evals import LiteLLMModelLBKSos.environ["OLLAMA_API_BASE"] = "http://localhost:11434" # for Ollama model = LiteLLMModel(model="ollama/llama3", model_kwargs={"api_base": "http://your-custom-endpoint"})LBKSYou can find more details in the official Phoenix Evals v2 documentation and LiteLLMModel usage guide. The main docs at arize.com still reference the legacy framework, so prefer the readthedocs site for v2.Would you like a step-by-step example for a specific provider or endpoint?Sources: - LiteLLMModel Example
💡 Hint: Mention RunLLM in the thread for followups.
Rasmus B.
·
·
RunLLM but is LiteLLMModel not part of the legacy framework? It does not appear in the API listing of the v2 framework
RunLLM
APP
·
·
You are correct: LiteLLMModel is part of the legacy Phoenix evaluation framework and does not appear in the API listing for the new v2 evaluation framework (see v2 docs). The v2 framework uses a new abstraction, typically via the LLM class and adapters, and does not expose LiteLLMModel as a public API.Would you like more detail on how to use custom models or endpoints with the v2 evaluation framework?Sources: - see v2 docs
💡 Hint: Mention RunLLM in the thread for followups.
Rasmus B.
·
·
RunLLM yes please
RunLLM
APP
·
·
To use a custom model or endpoint with the Phoenix Evals v2 framework, you should use the new LLM abstraction, which allows you to specify the provider, model, and connection details directly. For example, to use a custom endpoint (such as a LiteLLM proxy or OpenAI-compatible API), you can do: LBKSfrom phoenix.evals.preview.llm import LLMLBKSllm = LLM( provider="openai", # or "litellm" if supported model="your-model-name", api_key="your-api-key", base_url="http://your-custom-endpoint", )LBKSThis approach is documented in the phoenix-evals preview README. The v2 framework is designed to be adapter-based, so you can use any OpenAI-compatible endpoint by specifying the correct provider and base URL.Would you like a full example of running an evaluation with this setup?Sources: - phoenix-evals preview README
💡 Hint: Mention RunLLM in the thread for followups.
Rasmus B.
·
·
RunLLM I do not see any mention that those kwargs are supported or passed forward? Is there an example on how to use LiteLLM with e.g. ollama?
RunLLM
APP
·
·
Yes, you can use LiteLLM with Ollama by setting the appropriate environment variable (OLLAMA_API_BASE) and initializing the LLM with the correct model name. The v2 framework's LLM abstraction does not officially document support for arbitrary kwargs, but for LiteLLM/Ollama, the standard approach is: LBKSimport os from phoenix.evals.preview.llm import LLMLBKSos.environ["OLLAMA_API_BASE"] = "http://localhost:11434" llm = LLM(provider="litellm", model="ollama/llama3")LBKSThis pattern is shown in the Phoenix documentation and code examples. Passing additional kwargs directly to LLM is not documented for v2, so rely on environment variables for endpoint configuration.Would you like a complete evaluation example using this setup?Sources: - Phoenix evaluation-models.md
💡 Hint: Mention RunLLM in the thread for followups.
Rasmus B.
·
·
RunLLM litellm does not appear to be a valid provider value at all, it can only be used as a client? Could you also touch on my second question "how does the new evals framework interact with the dataset/experiments framework?"? Namely, can I do evaluation as part of executing an experiment, or will I have to execute it with run-evals or similar, and then upload manually?
RunLLM
APP
·
·
You are correct: in the Phoenix Evals v2 framework, litellm is not a valid provider—it is used as a client argument, while the provider should be the actual model provider (e.g., "openai", "anthropic", etc.). For example: LBKSfrom phoenix.evals.preview.llm import LLM llm = LLM(provider="openai", model="gpt-3.5-turbo", client="litellm")LBKSThis is confirmed in the LLM wrapper documentation.Regarding integration with the dataset/experiments framework: as of Evals v2, evaluation is decoupled from the experiments framework. You typically run evaluations separately (e.g., with run-evals or programmatically), then upload or log results to Phoenix. Direct integration—where evaluation is performed as part of an experiment execution—is not currently documented for v2; you must run evals and experiments as separate steps and manage results accordingly (see v2 docs).Would you like more detail or a workflow example for this process?Sources: - LLM wrapper documentation
💡 Hint: Mention RunLLM in the thread for followups.
Mikyo
·
·
Apologies for the out of date documentation. We’re working on a full revamp. Let us know there’s anything we can do in the meantime
Rasmus B.
·
·
Mikyo sounds awesome 🙂 we are currently using Github Models for prototyping, and it generally plays nice with Langchain/Langgraph if I either treat it as an OpenAI model with a custom api_base/api_key, OR if I use LiteLLM with the github/ prefix. is there anything I can do to run v2 evals with either of those modes? Something like wrapping a langchain chat interface or passing api_base/api_key?
Rasmus B.
·
·
would also be interested to hear a human perspective on how evals v2 is intended to work with datasets/experiments 🙂
Mikyo
·
·
would also be interested to hear a human perspective on how evals v2 is intended to work with datasets/experiments 🙂
This is one of the core motivations actually. They should "just work" on most cases. We should have much better documentation on this in the coming week.
Mikyo
·
·
Mikyo sounds awesome 🙂 we are currently using Github Models for prototyping, and it generally plays nice with Langchain/Langgraph if I either treat it as an OpenAI model with a custom api_base/api_key, OR if I use LiteLLM with the github/ prefix. is there anything I can do to run v2 evals with either of those modes? Something like wrapping a langchain chat interface or passing api_base/api_key?
Let me check with the team. If it works with LangChain or LiteLLM it should work but we may have not known about the github prefix. (https://github.com/Arize-ai/phoenix/issues/9685)
RunLLM
APP
·
·
If you tell me the right answer to this question, I'll make sure I don't make the same mistake twice!
Feedback
Type the right answer here...
↵ Please 'enter' to submit