KeyError Issue with Relevance Evaluator in GCP Environment | Arize AI Community

Arize AI Community Icon

Mikyo
·
Hey Teodor, the error seems to stem from the fact that the root_retrieved_df doesn't contain relevance information. I would guess it's related to the df creation. It could be related to a difference in version - but I would probably have to do a semi complicated git bisect to figure it out. Any chance Teodor you'd be willing to upgrade your local to match the GCP version? As you may have noticed we did sunset evals from experimental and so it probably would be good to have them in sync. Also we have cool stuff coming so want you to upgrade for those reasons too 😉
Teodor C.
·
Mikyo - thanks for the reply. Makes sense. I've got some help previously from Roger Y. regarding a scenario where i have multiple retrievers and I'm interested in getting only the root retriever and not the child retrievers. Here is what I'm doing now:
root_query = SpanQuery().select("span_id").where("parent_id is None") root_df = px.Client().query_spans(root_query) retrieved_query = SpanQuery().where( "span_kind == 'RETRIEVER'", ).select( "parent_id", "span_kind", input="input.value", ).explode( "retrieval.documents", reference = "document.content", document_score = "document.score", ) retrieved_df = px.Client().query_spans(retrieved_query) root_retrieved_df = retrieved_df[retrieved_df.parent_id.isin(root_df.index)]
What is interesting is that I'm expecting the retrieved_df to contain the reference and document score since the exact same query gets the reference and document score on my local machine and it does not get on GCP. See the attached two pictures - the query is exactly the same.
Teodor C.
·
The GCP version does have the latest arize & eval versions while my local machine (the working part) is running on 3.16.3
Mikyo
·
Interesting Teo - can't say anything is striking me as off immediately other than we have shipped some new instrumentation. Do you mind reminding me what type of application you are tracing? Does it use llama-index?
Teodor C.
·
The original app yes, it does use Llama Index. Here I'm just loading a parquet file that was obtained from the Llamaindex app and I'd like to run the retrieval evaluation on it.
Mikyo
·
I ask mainly because I just want to confirm you are NOT using the new llama-index instrumentation (2.0): https://github.com/Arize-ai/openinference/tree/main/python/instrumentation/openinference-instrumentation-llama-index This is the only thing making me think you might have different traces in GCP than your local. I'm honestly a bit stumped otherwise. LMK how I can maybe better understand the topology of the traces in the GCP setting.

·

In production this is what we're using:

openinference-instrumentation==0.1.6
    # via arize-phoenix
    # via openinference-instrumentation-langchain
    # via openinference-instrumentation-llama-index
    # via openinference-instrumentation-openai
openinference-instrumentation-langchain==0.1.15
    # via arize-phoenix
openinference-instrumentation-llama-index==1.4.0
    # via arize-phoenix
    # via llama-index-callbacks-arize-phoenix
openinference-instrumentation-openai==0.1.5
    # via arize-phoenix
openinference-semantic-conventions==0.1.6
    # via arize-phoenix
    # via openinference-instrumentation
    # via openinference-instrumentation-langchain
    # via openinference-instrumentation-llama-index
    # via openinference-instrumentation-openai
opentelemetry-api==1.24.0
    # via openinference-instrumentation
    # via openinference-instrumentation-langchain
    # via openinference-instrumentation-llama-index
    # via openinference-instrumentation-openai
    # via opentelemetry-exporter-otlp-proto-grpc
    # via opentelemetry-exporter-otlp-proto-http
    # via opentelemetry-instrumentation
    # via opentelemetry-sdk
opentelemetry-exporter-otlp==1.24.0
    # via arize-phoenix

Teodor C.
·
And arize-phoenix == 4.1.1. Keep in mind this is on the app that is generating the parquet dataset. Afterwards I'm importing that dataset on another machine to run the evals.
Mikyo
·
hmm, pretty stumped. Would it be possible for you to do a df.head(5) or print some of the data in GCP before it pipes through the evals? Or is that not compliant. I'll sync with the team this morning at standup. See if they can think of anything
Teodor C.
·
see DM
Mikyo
·
Thank you Teodor. I think I've repro'd something. Not 100% sure. Will take a bit of digging.
Mikyo
·
I did want to highlight some changes to the API in 4.0 with queries: phoenix 4 and above no longer queries all traces since that ultimately becomes too expensive now that we persist your data. So sorry this is badly documented. I've filed an issue to fix and we are actively working on API reference docs by version that probably would have helped. Basically query_spans has additional params now:
def query_spans( self, *queries: SpanQuery, start_time: Optional[datetime] = None, end_time: Optional[datetime] = None, limit: Optional[int] = DEFAULT_SPAN_LIMIT, root_spans_only: Optional[bool] = None, project_name: Optional[str] = None, ) -> Optional[Union[pd.DataFrame, List[pd.DataFrame]]]:
Mikyo
·
Hey Teodor C. - there is actually a slight format difference between v3 and v4. Constructing some code snippets to help you out with Roger Y.

·

Hi Teodor C. sorry for the inconvenience. Here’s the code I tested on your data. It should be working with this extra step.

from dataclasses import replace

import pandas as pd

import phoenix as px
from phoenix.trace.attributes import flatten
from phoenix.trace.otel import decode_otlp_span, encode_span_to_otlp
from phoenix.trace.trace_dataset import TraceDataset

file_name = "~/Downloads/trace_dataset.parquet"
ds = TraceDataset.from_spans(
    decode_otlp_span(encode_span_to_otlp(replace(span, attributes=dict(flatten(span.attributes)))))
    for span in TraceDataset(pd.read_parquet(file_name)).to_spans()
)
px.launch_app(trace=ds)

Teodor C.
·
seems to be working Roger Y.! thanks a lot as always! one additional question ( i can raise a separate thread if needed ) - I'm trying to use Claude Sonnet 3.5 via LiteLLM. So far the only tutorial I found is regarding ollama running locally. How would one define the LiteLLM and the RelevanceEvaluator to point to a LiteLLM machine ?