Troubleshooting Ollama and LiteLLM Evaluations in Streamlit

·Sep 10, 2024 10:10 PM

Hello everyone, I am trying to run evaluations using Ollama and LiteLLM. I've been following the evaluate_rag.ipynb notebook and am running it as a function in a Streamlit app instead of running it in a notebook. I have two issues:

1.
The relevance evaluator is taking a super long time (> 5 minutes for 30 traces). Is this normal?
2.
I get a warning saying WARNING [phoenix.evals.executors] Async evals execution is not supported in non-main threads. Falling back to sync. Is there a way to force the run_evals function to run asynchronously?

I would greatly appreciate anyone's help!

15 comments

· Sorted by Oldest

RunLLM
APP
·
1. Long Evaluation Time: The long evaluation time for the relevance evaluator can be due to several factors, including the complexity of the model and the data being processed. However, running evaluations synchronously can significantly slow down the process, especially if the concurrency is set high.
2. Async Execution Warning: The warning indicates that asynchronous evaluation is not supported in non-main threads. To force run_evals to run asynchronously, ensure it is executed in the main thread. If running in a Streamlit app, you might need to adjust your code to ensure the evaluations are not being called from a worker thread.
Would you like more detail on how to adjust your Streamlit app to support asynchronous evaluations?
Sources: - Support Phoenix Evals in threads when run_sync=True, support Evals in thread callbacks - Slack Discussion on nest_asyncio and run_sync
💡 Hint: Mention RunLLM in the thread for followups.
Roger Y.
·
have you try adding the following to your code?
import nest_asyncio nest_asyncio.apply()
Roger Y.
·
Dustin N. have you tried this in a streamlit app before?
Dustin N.
·
I haven’t, but there’s a good chance of it working I think
👍1
Beijun D.
·
Roger Y. Yes, whenever the app is run.
Roger Y.
·
paging the guru: Dustin N.
Dustin N.
·
hey Beijun D. let me check out your notebook and I'll see what I can find
Dustin N.
·
actually, is there any chance you can give a small snippet I can use to try and repro your issue?
Beijun D.
·
Hi Dustin N. I'm not running anything in a Jupyter notebook. I've broken up the evaluate_rag.ipynb code into multiple async functions which I then call in the streamlit app.
async def _executor(self): nest_asyncio.apply() session = px.launch_app() # print(f"🚀 Open the Phoenix UI if you haven't already: {session.url}") await self._generate_traces() await self._relevanceEvaluator() await self._qa_hallucination_evaluator()
All the async functions being broken down contain pretty much the same code in the notebook with the exception of the model. Sometimes I use OpenAIModel and sometimes LiteLLM.
Dustin N.
·
hi Beijun D., after doing some investigation, it appears that streamlit doesn't actually run async code in an exposed event loop in the main thread, but instead in "script threads": source. We wrote our evaluators to be responsive to termination signals using the signal library, which fails when not run on the main thread. If we need to run this in a worker thread we'd have to wire up a bypass for responding to termination signals, but that might result in a lot of wasted API requests in some cases, how big is the payload you're passing into llm_classify in this case?
Beijun D.
·
Dustin N. It's about 30 traces for 5-7 minutes. I've tried doing it with about 90 traces which took about 18 minutes
Dustin N.
·
ah, basically after digging I don't think there's a great way to achieve this given how streamlit is architected, we'd have to essentially remove the rate limiter (which prevents you from exceeding the rate limit too often) and the signal handling
Beijun D.
·
Thank you for trying! I appreciate all of the help!
Dustin N.
·
of course, I'm sorry it's not quite as easy as we'd hoped but I'm happy to help work through alternative solutions for you
Beijun D.
·
No worries! I was thinking of switching it to something other than streamlit anyway. Thank you for your help!
👍1

RunLLM
APP
·
1. Long Evaluation Time: The long evaluation time for the relevance evaluator can be due to several factors, including the complexity of the model and the data being processed. However, running evaluations synchronously can significantly slow down the process, especially if the concurrency is set high.
2. Async Execution Warning: The warning indicates that asynchronous evaluation is not supported in non-main threads. To force run_evals to run asynchronously, ensure it is executed in the main thread. If running in a Streamlit app, you might need to adjust your code to ensure the evaluations are not being called from a worker thread.
Would you like more detail on how to adjust your Streamlit app to support asynchronous evaluations?
Sources: - Support Phoenix Evals in threads when run_sync=True, support Evals in thread callbacks - Slack Discussion on nest_asyncio and run_sync
💡 Hint: Mention RunLLM in the thread for followups.
Roger Y.
·
have you try adding the following to your code?
import nest_asyncio nest_asyncio.apply()
Roger Y.
·
Dustin N. have you tried this in a streamlit app before?
Dustin N.
·
I haven’t, but there’s a good chance of it working I think
👍1
Beijun D.
·
Roger Y. Yes, whenever the app is run.
Roger Y.
·
paging the guru: Dustin N.
Dustin N.
·
hey Beijun D. let me check out your notebook and I'll see what I can find
Dustin N.
·
actually, is there any chance you can give a small snippet I can use to try and repro your issue?
Beijun D.
·
Hi Dustin N. I'm not running anything in a Jupyter notebook. I've broken up the evaluate_rag.ipynb code into multiple async functions which I then call in the streamlit app.
async def _executor(self): nest_asyncio.apply() session = px.launch_app() # print(f"🚀 Open the Phoenix UI if you haven't already: {session.url}") await self._generate_traces() await self._relevanceEvaluator() await self._qa_hallucination_evaluator()
All the async functions being broken down contain pretty much the same code in the notebook with the exception of the model. Sometimes I use OpenAIModel and sometimes LiteLLM.
Dustin N.
·
hi Beijun D., after doing some investigation, it appears that streamlit doesn't actually run async code in an exposed event loop in the main thread, but instead in "script threads": source. We wrote our evaluators to be responsive to termination signals using the signal library, which fails when not run on the main thread. If we need to run this in a worker thread we'd have to wire up a bypass for responding to termination signals, but that might result in a lot of wasted API requests in some cases, how big is the payload you're passing into llm_classify in this case?
Beijun D.
·
Dustin N. It's about 30 traces for 5-7 minutes. I've tried doing it with about 90 traces which took about 18 minutes
Dustin N.
·
ah, basically after digging I don't think there's a great way to achieve this given how streamlit is architected, we'd have to essentially remove the rate limiter (which prevents you from exceeding the rate limit too often) and the signal handling
Beijun D.
·
Thank you for trying! I appreciate all of the help!
Dustin N.
·
of course, I'm sorry it's not quite as easy as we'd hoped but I'm happy to help work through alternative solutions for you
Beijun D.
·
No worries! I was thinking of switching it to something other than streamlit anyway. Thank you for your help!
👍1