Troubleshooting LangChain App Evaluation Issues with Phoenix

xuejiao w. · 2024-02-22T06:04:54.004Z

Hi there, I followed this tutorial (https://github.com/Arize-ai/phoenix/blob/main/tutorials/tracing/langchain_tracing_tutorial.ipynb) , trying to trace and evaluate my own app(langChain based), I can see the tracing(input,output) from UI. But, the get_qa_with_reference() and get_retrieved_documents() doesn't get result, so no evaluation results. Any ideas? Thanks! session = px.launch_app() LangChainInstrumentor().instrument() for question in self.question_pool: self.chat_app.get_answer(question) queries_df = get_qa_with_reference(px.Client()) retrieved_documents_df = get_retrieved_documents(px.Client()) print(f"queries_df: {queries_df}") print(f" retrieved_documents_df: {retrieved_documents_df}")

Mikyo
·
Hi xuejiao w. is your app a rag application? If not those utilities might not yield results. For an in depth way to extract data, check out our docs https://docs.arize.com/phoenix/how-to/extract-data-from-spans
xuejiao w.
·
Mikyo Yes, my app is rag app. is there some issue in my code? Thanks for sharing another way. Let me try it.

xuejiao w.

hi Mikyo I tried the example(get retrieved documents) from https://docs.arize.com/phoenix/how-to/extract-data-from-spans it doesn't output result, I check UI. there is no 'RETRIEVER' span. Any idea? Thanks !

qdef get_retrieved_documents():
    query = SpanQuery().where(
        # Filter for the `RETRIEVER` span kind.
        # The filter condition is a string of valid Python boolean expression.
        "span_kind == 'RETRIEVER'",
    ).select(
        # Extract the span attribute `input.value` which contains the query for the
        # retriever. Rename it as the `input` column in the output dataframe.
        input="input.value",
    ).explode(
        # Specify the span attribute `retrieval.documents` which contains a list of
        # objects and explode the list. Extract the `document.content` attribute from
        # each object and rename it as the `reference` column in the output dataframe.
        "retrieval.documents",
        reference="document.content",
    )

    # The Phoenix Client can take this query and return the dataframe.
    df = px.Client().query_spans(query)
    return df

xuejiao w.
·
my main code is simple, I can see the input/output on UI
Mikyo
·
I see xuejiao w. - I'm not sure I know what your traces look like? You can just get spans out via px.Client().get_spans_dataframe() Is it possible for you to show me what a trace tree looks like? Also what evaluation criteria do you have?
Mikyo
·
As you mention, you won't have documents if you don't have retrievers
xuejiao w.
·
The trace I see on UI. Is this you ask?
xuejiao w.
·
For evaluation, I want to use the phoneix builtin, QAEvaluator, HallucinationEvaluator
xuejiao w.
·
I can see documents in 'input', but no retriver. I need the retrived_documents to do qa_correctness and hallucination evaluation.
Mikyo
·
I see, yes, I'm not quite sure why you lack retrievers. I've filed a ticket here: https://github.com/Arize-ai/openinference/issues/239 to track the issue. xuejiao w. it looks like you may not be using something like a vector store for retrieval? What are you using to pull the page_content documents?
xuejiao w.
·
I trace the code. looks the retriver use "from langchain.schema import BaseRetriever "
xuejiao w.
·
xuejiao w.
·
self.vector_store = Chroma( persist_directory=vector_store_path, embedding_resource=embedding_resource)
xuejiao w.
·
which use opensource : chromadb
Mikyo
·
Got it, thanks so much for the details. I can't seem to find create_conversational_retrieval_with_score_chain in the langchain docs but we will try to repro on our end. Might be a few days before we can bottom out the problem though. It looks like you can probably parse out the documents from the input to pass to the evals. Let us know if you are able to get that working!

Troubleshooting LangChain App Evaluation Issues with Phoenix

57 comments

Troubleshooting LangChain App Evaluation Issues with Phoenix

57 comments