Comparing LlamaIndex Query Engines with Arize Phoenix Setup

·Oct 12, 2024 11:37 PM

I have two llamaindex query engines that I want to view as two different projects or metadata tags two compare the retrieved results how can I achieve this:

# setup Arize Phoenix for logging/observability
import llama_index.core
import os

PHOENIX_API_KEY = os.getenv("PHOENIX_API_KEY")
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
llama_index.core.set_global_handler(
    "arize_phoenix", endpoint="https://llamatrace.com/v1/traces"
)

``` query_engine = MultimodalQueryEngine( retriever=index.as_retriever(similarity_top_k=3), multi_modal_llm=azure_openai_mm_llm, ) baseline_query_engine = MultimodalQueryEngine( retriever=baseline_index.as_retriever(similarity_top_k=3), multi_modal_llm=azure_openai_mm_llm, )

5 comments

· Sorted by Oldest

RunLLM
APP
·
To view the results of two different LlamaIndex query engines as separate projects or with different metadata tags, you can use the using_project context manager or add metadata to the spans. Here's how you can achieve this:
Using Projects You can use the using_project context manager to switch between projects for each query engine:
```
from openinference.instrumentation import using_project from llama_index.core import set_global_handler import os
PHOENIX_API_KEY = os.getenv("PHOENIX_API_KEY") os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"api_key={PHOENIX_API_KEY}" set_global_handler("arize_phoenix", endpoint="https://llamatrace.com/v1/traces")
query_engine = MultimodalQueryEngine( retriever=index.as_retriever(similarity_top_k=3), multi_modal_llm=azure_openai_mm_llm, ) baseline_query_engine = MultimodalQueryEngine( retriever=baseline_index.as_retriever(similarity_top_k=3), multi_modal_llm=azure_openai_mm_llm, )
with using_project("Project_A"): result_a = query_engine.query("Your query here")
with using_project("Project_B"): result_b = baseline_query_engine.query("Your query here")
LBKS**Using Metadata** Alternatively, you can add metadata to the spans to differentiate between the query engines: LBKS``` from openinference.instrumentation import using_metadata from llama_index.core import set_global_handler import os PHOENIX_API_KEY = os.getenv("PHOENIX_API_KEY") os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"api_key={PHOENIX_API_KEY}" set_global_handler("arize_phoenix", endpoint="[https://llamatrace.com/v1/traces](https://llamatrace.com/v1/traces)") query_engine = MultimodalQueryEngine( retriever=[index.as](http://index.as)_retriever(similarity_top_k=3), multi_modal_llm=azure_openai_mm_llm, ) baseline_query_engine = MultimodalQueryEngine( retriever=[baseline_index.as](http://baseline_index.as)_retriever(similarity_top_k=3), multi_modal_llm=azure_openai_mm_llm, ) # Add metadata to differentiate between query engines with using_metadata({"engine": "query_engine"}): result_a = query_engine.query("Your query here") with using_metadata({"engine": "baseline_query_engine"}): result_b = baseline_query_engine.query("Your query here")
Would you like more details on either approach?
Sources: - GitHub Issue #3375 - GitHub Issue #2340 - Phoenix Tracing with Projects Tutorial
💡 Hint: Mention RunLLM in the thread for followups.
Farzad S.
·
RunLLM what is preferred if I want to run evaluations agianst 3 different query engines to compare?
RunLLM
APP
·
To run evaluations against three different query engines and compare their results, using metadata tags is generally preferred. This approach allows you to keep all the data within a single project while differentiating between the query engines using metadata. This makes it easier to compare results side-by-side.
Here's how you can achieve this:
Using Metadata You can add metadata to the spans to differentiate between the query engines:
```
from openinference.instrumentation import using_metadata from llama_index.core import set_global_handler import os
PHOENIX_API_KEY = os.getenv("PHOENIX_API_KEY") os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"api_key={PHOENIX_API_KEY}" set_global_handler("arize_phoenix", endpoint="https://llamatrace.com/v1/traces")
query_engine_1 = MultimodalQueryEngine( retriever=index1.as_retriever(similarity_top_k=3), multi_modal_llm=azure_openai_mm_llm, ) query_engine_2 = MultimodalQueryEngine( retriever=index2.as_retriever(similarity_top_k=3), multi_modal_llm=azure_openai_mm_llm, ) query_engine_3 = MultimodalQueryEngine( retriever=index3.as_retriever(similarity_top_k=3), multi_modal_llm=azure_openai_mm_llm, )
with using_metadata({"engine": "query_engine_1"}): result_1 = query_engine_1.query("Your query here")
with using_metadata({"engine": "query_engine_2"}): result_2 = query_engine_2.query("Your query here")
with using_metadata({"engine": "query_engine_3"}): result_3 = query_engine_3.query("Your query here")
LBKSThis approach allows you to tag each query with metadata indicating which query engine was used. You can then filter and compare the results based on these tags. Would you like more details on setting up the evaluations or any other aspect? Sources: - [GitHub Issue #3375](https://github.com/Arize-ai/phoenix/issues/3375)
💡 Hint: Mention RunLLM in the thread for followups.
Farzad S.
·
My traces seem to be duplicating for some reason?
Roger Y.
·
the start time and latencies are different, so they aren’t exact duplicates. they were probably separate runs

RunLLM
APP
·
To view the results of two different LlamaIndex query engines as separate projects or with different metadata tags, you can use the using_project context manager or add metadata to the spans. Here's how you can achieve this:
Using Projects You can use the using_project context manager to switch between projects for each query engine:
```
from openinference.instrumentation import using_project from llama_index.core import set_global_handler import os
PHOENIX_API_KEY = os.getenv("PHOENIX_API_KEY") os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"api_key={PHOENIX_API_KEY}" set_global_handler("arize_phoenix", endpoint="https://llamatrace.com/v1/traces")
query_engine = MultimodalQueryEngine( retriever=index.as_retriever(similarity_top_k=3), multi_modal_llm=azure_openai_mm_llm, ) baseline_query_engine = MultimodalQueryEngine( retriever=baseline_index.as_retriever(similarity_top_k=3), multi_modal_llm=azure_openai_mm_llm, )
with using_project("Project_A"): result_a = query_engine.query("Your query here")
with using_project("Project_B"): result_b = baseline_query_engine.query("Your query here")
LBKS**Using Metadata** Alternatively, you can add metadata to the spans to differentiate between the query engines: LBKS``` from openinference.instrumentation import using_metadata from llama_index.core import set_global_handler import os PHOENIX_API_KEY = os.getenv("PHOENIX_API_KEY") os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"api_key={PHOENIX_API_KEY}" set_global_handler("arize_phoenix", endpoint="[https://llamatrace.com/v1/traces](https://llamatrace.com/v1/traces)") query_engine = MultimodalQueryEngine( retriever=[index.as](http://index.as)_retriever(similarity_top_k=3), multi_modal_llm=azure_openai_mm_llm, ) baseline_query_engine = MultimodalQueryEngine( retriever=[baseline_index.as](http://baseline_index.as)_retriever(similarity_top_k=3), multi_modal_llm=azure_openai_mm_llm, ) # Add metadata to differentiate between query engines with using_metadata({"engine": "query_engine"}): result_a = query_engine.query("Your query here") with using_metadata({"engine": "baseline_query_engine"}): result_b = baseline_query_engine.query("Your query here")
Would you like more details on either approach?
Sources: - GitHub Issue #3375 - GitHub Issue #2340 - Phoenix Tracing with Projects Tutorial
💡 Hint: Mention RunLLM in the thread for followups.
Farzad S.
·
RunLLM what is preferred if I want to run evaluations agianst 3 different query engines to compare?
RunLLM
APP
·
To run evaluations against three different query engines and compare their results, using metadata tags is generally preferred. This approach allows you to keep all the data within a single project while differentiating between the query engines using metadata. This makes it easier to compare results side-by-side.
Here's how you can achieve this:
Using Metadata You can add metadata to the spans to differentiate between the query engines:
```
from openinference.instrumentation import using_metadata from llama_index.core import set_global_handler import os
PHOENIX_API_KEY = os.getenv("PHOENIX_API_KEY") os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"api_key={PHOENIX_API_KEY}" set_global_handler("arize_phoenix", endpoint="https://llamatrace.com/v1/traces")
query_engine_1 = MultimodalQueryEngine( retriever=index1.as_retriever(similarity_top_k=3), multi_modal_llm=azure_openai_mm_llm, ) query_engine_2 = MultimodalQueryEngine( retriever=index2.as_retriever(similarity_top_k=3), multi_modal_llm=azure_openai_mm_llm, ) query_engine_3 = MultimodalQueryEngine( retriever=index3.as_retriever(similarity_top_k=3), multi_modal_llm=azure_openai_mm_llm, )
with using_metadata({"engine": "query_engine_1"}): result_1 = query_engine_1.query("Your query here")
with using_metadata({"engine": "query_engine_2"}): result_2 = query_engine_2.query("Your query here")
with using_metadata({"engine": "query_engine_3"}): result_3 = query_engine_3.query("Your query here")
LBKSThis approach allows you to tag each query with metadata indicating which query engine was used. You can then filter and compare the results based on these tags. Would you like more details on setting up the evaluations or any other aspect? Sources: - [GitHub Issue #3375](https://github.com/Arize-ai/phoenix/issues/3375)
💡 Hint: Mention RunLLM in the thread for followups.
Farzad S.
·
My traces seem to be duplicating for some reason?
Roger Y.
·
the start time and latencies are different, so they aren’t exact duplicates. they were probably separate runs