Strategies for Parallelizing Experiments with Large Datasets

·Jun 05, 2025 04:01 PM

Parallelize experiments: We have large datasets in Phoenix with > 15k samples. I noticed that running experiments with large datasets is very slow. The data samples are seemingly processed sequentially. Is there a way to parallelize the processing, especially when hitting model providers like OpenAI? Parallelizing experiments is also beneficial when testing open-source LLMs like Scout. Setting up test instances is expensive, and those instances, like H100s, can handle multiple requests at the same time. It would reduce prototyping costs drastically if the requests could be parallelized.

4 comments

· Sorted by Oldest

RunLLM
APP
·
Yes, Phoenix supports parallelization of experiment runs via asynchronous execution and concurrency controls. For OpenAI and similar providers, Phoenix uses an async executor to submit requests concurrently, maximizing throughput and handling rate limits efficiently. You can control concurrency by setting the `concurrency` parameter in functions like `llm_classify` or `run_experiment`. This approach also works for open-source LLMs, provided your task function is written as a coroutine and the model supports async calls. For notebook environments, patching the event loop with `nest_asyncio.apply()` is recommended for full async support. For example, to run an experiment with high concurrency: ``` import nest_asyncio nest_asyncio.apply() experiment = run_experiment( dataset=dataset, task=task, evaluators=evaluators, concurrency=30, # adjust based on your hardware and provider limits ) ``` See <https://github.com/Arize-ai/phoenix/blob/main/docs/evaluation/llm-evals.md|Phoenix LLM Evals docs>, <https://github.com/Arize-ai/phoenix/pull/1754|AsyncExecutor PR>, and <https://github.com/Arize-ai/phoenix/issues/6198|example issue> for details. Would you like more detail on configuring concurrency or async tasks? Sources: - <https://github.com/Arize-ai/phoenix/blob/main/docs/evaluation/llm-evals.md|Phoenix LLM Evals docs> - <https://github.com/Arize-ai/phoenix/pull/1754|AsyncExecutor PR> - <https://github.com/Arize-ai/phoenix/issues/6198|Concurrency issue example>
💡 Hint: Mention RunLLM in the thread for followups.
Hannes
·
The answer ^^^ is perfect. Let me try it out!
Dustin N.
·
hi Hannes depending on where you're invoking this, it's likely you might need to enable concurrency by using the nest_asyncio library
import nest_asyncio nest_asyncio.apply()
prior to running your code
Dustin N.
·
this is only relevant if you're in an environment with an already running event loop (e.g. jupyter notebooks)