Troubleshooting Incremental Document Retrieval in Arize with LlamaIndex

Apoorva J. · 2024-09-19T22:29:38.785Z

Hi! I'm trying to compare different retrieval strategies (MODES) using Arize and LlamaIndex. here's a snippet from my code: for mode in MODES: query_engine = vector_store_index.as_query_engine(similarity_top_k=5, vector_store_query_mode=mode) px.close_app() px.launch_app() for query in queries[0:2]: query_engine.query(query) retrieved_documents = get_retrieved_documents(px.active_session()) I'm relauncing Phoenix for each mode but I'm seeing that the length of retrieved_documents is incrementally increasing despite the reset which leads me to believe that it's not actually relaunching. Am I doing something wrong?

RunLLM
APP
·
It appears that the px.close_app() method is not fully clearing the session data before relaunching Phoenix. This could be causing the incremental increase in the length of retrieved_documents. To ensure that all stored data is deleted, you should use the delete_data=True parameter with the close_app method.Here's the corrected code snippet: LBKSfor mode in MODES: query_engine = vector_store_index.as_query_engine(similarity_top_k=5, vector_store_query_mode=mode) px.close_app(delete_data=True) px.launch_app() for query in queries[0:2]: query_engine.query(query) retrieved_documents = get_retrieved_documents(px.active_session())LBKSWould you like more detail?Sources: - https://github.com/Arize-ai/phoenix/issues/2218 - https://github.com/Arize-ai/phoenix/issues/2047
💡 Hint: Mention RunLLM in the thread for followups.
RunLLM
APP
·
To switch projects using the using_project context manager, you can wrap your code for each mode within the context manager. This will ensure that traces for each mode are sent to a separate project. Here’s how you can modify your code: LBKSfrom phoenix.trace import using_projectLBKSfor mode in MODES: project_name = f"project_{mode}" with using_project(project_name): query_engine = vector_store_index.as_query_engine(similarity_top_k=5, vector_store_query_mode=mode) for query in queries[0:2]: query_engine.query(query) retrieved_documents = get_retrieved_documents(px.active_session()) print(f"Retrieved documents for {mode}: {retrieved_documents}")LBKSThis code will dynamically switch the project for each mode, ensuring that traces are separated by project.Would you like more details on any specific part?Sources: - using_project context manager - Phoenix Tracing Documentation
💡 Hint: Mention RunLLM in the thread for followups.
Apoorva J.
·
RunLLM That doesn't seem to work. That still gives me the incremental error.
Mikyo
·
RunLLM is a bit wrong there. You will need to specify the project in the client. RunLLM can you show how to get retrieved documents from a given project using px.Client instead of the session?
RunLLM
APP
·
To get retrieved documents from a specific project using px.Client, you need to specify the project_name parameter in the get_retrieved_documents function. Here’s how you can modify your code: LBKSfrom phoenix.trace import using_project import phoenix as pxLBKSclient = px.Client()LBKSfor mode in MODES: project_name = f"project_{mode}" with using_project(project_name): query_engine = vector_store_index.as_query_engine(similarity_top_k=5, vector_store_query_mode=mode) for query in queries[0:2]: query_engine.query(query) retrieved_documents = get_retrieved_documents(client, project_name=project_name) print(f"Retrieved documents for {mode} in project {project_name}: {retrieved_documents}")LBKSThis ensures that the retrieved documents are fetched from the specified project.Would you like more details on any specific part?Sources: - get_retrieved_documents function
💡 Hint: Mention RunLLM in the thread for followups.
Apoorva J.
·
Ahh but I'm not using the hosted version of Arize so don't really have a client defined.
Apoorva J.
·
OH!?
Apoorva J.
·
Didn't know that!
🙌1
Apoorva J.
·
Atleast with close/launch app, the incremental number was coming from close_app not clearing the session data.
Mikyo
·
Yeah you still have a local database file that's not closed
Mikyo
·
can you try px.delete_all() after close_app?
Mikyo
·
RunLLM can you describe how px.delete_all works? Provide the API documentation
RunLLM
APP
·
The px.delete_all() function in Phoenix is used to delete all persisted data under the PHOENIX_WORKING_DIR, effectively resetting your session for the next launch. This includes traces, evaluations, and any other data stored in the working directory.Here’s how you can use it in your code: LBKSfor mode in MODES: query_engine = vector_store_index.as_query_engine(similarity_top_k=5, vector_store_query_mode=mode) px.close_app(delete_data=True) px.delete_all() px.launch_app() for query in queries[0:2]: query_engine.query(query) retrieved_documents = get_retrieved_documents(px.active_session()) print(f"Retrieved documents for {mode}: {retrieved_documents}")LBKSThis ensures that all session data is cleared before relaunching Phoenix.Would you like more details on any specific part?Sources: - Phoenix Session API - Phoenix Session Code
💡 Hint: Mention RunLLM in the thread for followups.
Apoorva J.
·
Hah setting delete_data to True had the same effect. I’m going to try the project method. Seems cleaner and what one would do in a prod-like setup.
2
Apoorva J.
·
Thanks for the suggestions!

16 comments