Parallelize experiments: We have large datasets in Phoenix with > 15k samples. I noticed that running experiments with large datasets is very slow. The data samples are seemingly processed sequentially. Is there a way to parallelize the processing, especially when hitting model providers like OpenAI? Parallelizing experiments is also beneficial when testing open-source LLMs like Scout. Setting up test instances is expensive, and those instances, like H100s, can handle multiple requests at the same time. It would reduce prototyping costs drastically if the requests could be parallelized.
Yes, Phoenix supports parallelization of experiment runs via asynchronous execution and concurrency controls. For OpenAI and similar providers, Phoenix uses an async executor to submit requests concurrently, maximizing throughput and handling rate limits efficiently. You can control concurrency by setting the concurrency parameter in functions like llm_classify or run_experiment. This approach also works for open-source LLMs, provided your task function is written as a coroutine and the model supports async calls. For notebook environments, patching the event loop with nest_asyncio.apply() is recommended for full async support. For example, to run an experiment with high concurrency: LBKSimport nest_asyncio nest_asyncio.apply()LBKSexperiment = run_experiment( dataset=dataset, task=task, evaluators=evaluators, concurrency=30, # adjust based on your hardware and provider limits )LBKSSee Phoenix LLM Evals docs, AsyncExecutor PR, and example issue for details.Would you like more detail on configuring concurrency or async tasks?Sources:
- Phoenix LLM Evals docs
- AsyncExecutor PR
- Concurrency issue example
馃挕聽Hint: Mention RunLLM in the thread for followups.
The answer ^^^ is perfect. Let me try it out!
this is only relevant if you're in an environment with an already running event loop (e.g. jupyter notebooks)
