Inquiring About Gotchas for Horizontal Scaling with Multiple Instances

Peter K. · 2025-10-14T21:03:00.498Z

In the production guide, it mentions horizontal scaling which is good but not much other detail. Wondering if there is any gotchas with running more than 1 instance at a time? Assuming its safe but hoping there's a pre-existing answer to this

10 comments

· Sorted by Oldest

RunLLM
APP
·
·
Phoenix is stateless and supports horizontal scaling; you can run multiple Phoenix containers backed by a single Postgres database. This is a supported and safe approach for handling increased concurrency and workload, as confirmed by the Phoenix team and production guide. However, you should monitor for potential database contention (e.g., deadlocks) under high ingestion load, and tune your batch processor and database settings as needed for your workload. Automated migrations are handled at startup, but using a dedicated database for Phoenix is recommended to avoid conflicts with other applications. No unique "gotchas" are documented for running multiple instances, but you should ensure proper resource scaling (memory, disk, and database), enable batch processing, and monitor for any database performance issues. If you experience deadlocks or out-of-memory issues, consider tuning batch sizes, queue sizes, and database connection pooling, and avoid running multiple Phoenix pods against the same database without affinity if you see contention issues. Would you like more detail on best practices for scaling or database tuning?Sources: - Phoenix deployment Slack discussion - Phoenix deadlock troubleshooting - Phoenix out-of-memory troubleshooting
💡 Hint: Mention RunLLM in the thread for followups.
Priyan
External
·
·
To my knowledge, Phoenix is fully robust to horizontal scaling. Tagging 🔒[private user] in case he has any insight here.
Dustin N.
·
·
hi Peter K. yes phoenix is horizontally scalable but it’s worth noting that depending on the performance you’re looking to scale you still might run into some limitations
Dustin N.
·
·
for instance our ingest performance does not scale well with node count due to backend limitations
Dustin N.
·
·
and in some limited occasions you might even run into ingestion problems, though we believe those cases are rare
Peter K.
·
·
Ah ok interesting. We are having ingest problems where large volumes of spans are causing our instance to become unavailable. We see memory maxing out when this happens (4gb 2vcpu)
Peter K.
·
·
My assumption is the batches are queued in memory for processing and it isn't processing fast enough leading to this backup. But maybe the problem is elsewhere?
Dustin
External
·
·
hey Peter K. this is an ongoing issue we’re investigating, currently spans can get backed up because ingest is performance limited by multiple mechanisms and we’re investigating ways to improve it. we recently added a limit on the ingestion queue to prevent the server from running out of memory in these cases, can you verify that your phoenix is on the latest version?
Peter K.
·
·
Ah yep that explains it. We updated today. We were not getting this error before
Peter K.
·
·
Interestingly we weren't filling up our memory before either though 🤔

RunLLM
APP
·
·
Phoenix is stateless and supports horizontal scaling; you can run multiple Phoenix containers backed by a single Postgres database. This is a supported and safe approach for handling increased concurrency and workload, as confirmed by the Phoenix team and production guide. However, you should monitor for potential database contention (e.g., deadlocks) under high ingestion load, and tune your batch processor and database settings as needed for your workload. Automated migrations are handled at startup, but using a dedicated database for Phoenix is recommended to avoid conflicts with other applications. No unique "gotchas" are documented for running multiple instances, but you should ensure proper resource scaling (memory, disk, and database), enable batch processing, and monitor for any database performance issues. If you experience deadlocks or out-of-memory issues, consider tuning batch sizes, queue sizes, and database connection pooling, and avoid running multiple Phoenix pods against the same database without affinity if you see contention issues. Would you like more detail on best practices for scaling or database tuning?Sources: - Phoenix deployment Slack discussion - Phoenix deadlock troubleshooting - Phoenix out-of-memory troubleshooting
💡 Hint: Mention RunLLM in the thread for followups.
Priyan
External
·
·
To my knowledge, Phoenix is fully robust to horizontal scaling. Tagging 🔒[private user] in case he has any insight here.
Dustin N.
·
·
hi Peter K. yes phoenix is horizontally scalable but it’s worth noting that depending on the performance you’re looking to scale you still might run into some limitations
Dustin N.
·
·
for instance our ingest performance does not scale well with node count due to backend limitations
Dustin N.
·
·
and in some limited occasions you might even run into ingestion problems, though we believe those cases are rare
Peter K.
·
·
Ah ok interesting. We are having ingest problems where large volumes of spans are causing our instance to become unavailable. We see memory maxing out when this happens (4gb 2vcpu)
Peter K.
·
·
My assumption is the batches are queued in memory for processing and it isn't processing fast enough leading to this backup. But maybe the problem is elsewhere?
Dustin
External
·
·
hey Peter K. this is an ongoing issue we’re investigating, currently spans can get backed up because ingest is performance limited by multiple mechanisms and we’re investigating ways to improve it. we recently added a limit on the ingestion queue to prevent the server from running out of memory in these cases, can you verify that your phoenix is on the latest version?
Peter K.
·
·
Ah yep that explains it. We updated today. We were not getting this error before
Peter K.
·
·
Interestingly we weren't filling up our memory before either though 🤔