Dmitry

Commented on Arize Phoenix Load Testing Results for Production...·Posted inPhoenix Support

Hi Mikyo, thanks a lot for your response! We will explore it and probably ask additional questions.

Arize Phoenix Load Testing Results for Production Decision

Hi, Could you please tell do you have any results of Arize Phoenix load testing? It is important for us to decide if we can use it also for production environment or only for development stage.

3Comments

Commented on Using Langchain LLM Models with Arize Phoenix Eval...·Posted inPhoenix Support

Dmitry

Thank you Xander S. !

Commented on Using Langchain LLM Models with Arize Phoenix Eval...·Posted inPhoenix Support

Dmitry

Hi Xander S. Anthony P. Sorry if I had not clear enough question, I will try to explain it based on the examples. I tried to reproduce your quickstarts from the documentation, including the following: - Evals (https://docs.arize.com/phoenix/evaluation/evals) - Datasets and experiments (https://docs.arize.com/phoenix/datasets-and-experiments/quickstart-datasets) As I understand, on of the main differences between Evals and Experiments is that we also have a task in the experiments we need to perform before evaluation, is it correct? Then I found that in both scenarios you use OpenAIModel as an example evaluation model. For our cases, we want to experiment with langchain/langgraph frameworks and to use different LLMs (and Runnable chains based on it), which have integration with langchain, as evaluators. For Evals quickstart, I found that you use predefined evaluators and OpenAIModel as eval model for them:

Python
# Set your OpenAI API key
eval_model = OpenAIModel(model="gpt-4o-mini")

# Define your evaluators
hallucination_evaluator = HallucinationEvaluator(eval_model)
qa_evaluator = QAEvaluator(eval_model)

But I didn't find the way how to use my LLM/Chain from langchain with your pre-built evaluators or create custom evaluator, for example, for run_evals function which you use in this quickstart. For Experiments case, I understand that you are correct that we can create custom evaluator as function and use it in run_experiment function. I created an example of custom evaluator for langchain Runnable chain (based on example from this quickstart) and it works correctly:

python
llm = ChatOpenAI(
    model='gpt-4o-mini',
    temperature=0
)

eval_prompt_template = """
Given the QUESTION and REFERENCE_ANSWER, determine whether the ANSWER is accurate.
Output only a single word (accurate or inaccurate).

QUESTION: {question}

REFERENCE_ANSWER: {reference_answer}

ANSWER: {answer}

ACCURACY (accurate / inaccurate):
"""

@create_evaluator(kind="llm")  # need the decorator or the kind will default to "code"
def accuracy(input: Dict[str, Any], output: str, expected: Dict[str, Any]) -> float:
    prompt = ChatPromptTemplate.from_messages(
        ("user", eval_prompt_template)
    )
    output_parser = StrOutputParser()
    chain = prompt | llm | output_parser
    response = chain.invoke({
        "question": input["question"], 
        "reference_answer": expected["answer"], 
        "answer": output
    }).lower().strip()
    return 1.0 if response == "accurate" else 0.0

But I am confused that even if you use OpenAIModel for both cases and pass it to evaluators argument for run_evals and run_experiment functions and it works correctly, for langchain usage I can use only experiments using @create_evaluator decorator and this evaluator can't be used in the Evals case - the same custom evaluator will have the following error: AttributeError: 'SyncEvaluator' object has no attribute 'default_concurrency' Is there any universal way to create custom evaluator for both Evals and Experiments cases?

Commented on Using Langchain LLM Models with Arize Phoenix Eval...·Posted inPhoenix Support

Dmitry

Xander S. could you please help where I can read or know about such customization?

Commented on Using Langchain LLM Models with Arize Phoenix Eval...·Posted inPhoenix Support

Dmitry

Yes, but we need not custom evaluators but custom LLM model from langchain integration package and now langchain doesn’t have direct support based on the information from https://docs.arize.com/phoenix/api/evaluation-models#supported-llm-providers

Commented on Using Langchain LLM Models with Arize Phoenix Eval...·Posted inPhoenix Support

Dmitry

Anthony P. thank you for your answer, we need to use some LLMs that are available via langchain integration packages, so if there is no current support for langchain LLMs, it would be great to find the way how to properly create this support ourselves. I also found that experiments also use these models, so by implementing it we can use both of them. The main task now is to get the information about correct customization for this task.

Commented on Using Langchain LLM Models with Arize Phoenix Eval...·Posted inPhoenix Support

Dmitry

The same question about langchain LLM support is also for using it in prompt playground.

Commented on Using Langchain LLM Models with Arize Phoenix Eval...·Posted inPhoenix Support

Dmitry

I saw this PR, but positive answer from the bot is not consistent with the answer in PR, and I can't find any additional information about current support or creating custom LLMs, including langchain models, so our question is still relevant.

Posted in Phoenix Support·

Dmitry

Using Langchain LLM Models with Arize Phoenix Evals: A Guide

Hi, Could you please provide information is it possible to use langchain LLM models with arize-phoenix-evals out of the box? If there is no such support now, do you have any documentation about creating custom models for such purpose? Thank you.

13Comments