Questions on Query Behavior and Evaluation Metrics in Arize
Hello Arize team, I have two questions: 1 - Did something happen / change with the query explode? When I run this command, I get an output:
query = SpanQuery().where(
"span_kind" == "RERANKER",
).select(
# input="reranker.query",
model = "reranker.model_name",
)
reranked_docs_df = px.active_session().query_spans(query)
reranked_docs_dfHowever, if I run this command, I don't get anything back:
query = SpanQuery().where(
"span_kind" == "RERANKER",
).select(
# input="reranker.query",
model = "reranker.model_name",
).explode(
"reranker.output_documents",
reference = "reranker.document_content",
)
reranked_docs_df = px.active_session().query_spans(query)
reranked_docs_dfIf I switch the reference in the select, I can get the result of the list of documents for the output contents:
query = SpanQuery().where(
"span_kind" == "RERANKER",
).select(
# input="reranker.query",
model = "reranker.model_name",
reference = "reranker.output_documents"
)
reranked_docs_df = px.active_session().query_spans(query)
reranked_docs_df2 - What is the suggested way of performing evaluations when query transformation is in place (same query asked multiple times in different ways) like in the picture attached? I do want to calculate the DCG@5, Precision@5 and Hit rate for the retrieval part, however I am trying to think of the best way to capture the retrieval evaluation and be considerate of the cost considering I'm getting 100s of documents per question.
