In the production guide, it mentions horizontal scaling which is good but not much other detail. Wondering if there is any gotchas with running more than 1 instance at a time? Assuming its safe but hoping there's a pre-existing answer to this
Phoenix is stateless and supports horizontal scaling; you can run multiple Phoenix containers backed by a single Postgres database. This is a supported and safe approach for handling increased concurrency and workload, as confirmed by the Phoenix team and production guide. However, you should monitor for potential database contention (e.g., deadlocks) under high ingestion load, and tune your batch processor and database settings as needed for your workload. Automated migrations are handled at startup, but using a dedicated database for Phoenix is recommended to avoid conflicts with other applications. No unique "gotchas" are documented for running multiple instances, but you should ensure proper resource scaling (memory, disk, and database), enable batch processing, and monitor for any database performance issues. If you experience deadlocks or out-of-memory issues, consider tuning batch sizes, queue sizes, and database connection pooling, and avoid running multiple Phoenix pods against the same database without affinity if you see contention issues. Would you like more detail on best practices for scaling or database tuning?Sources: - Phoenix deployment Slack discussion - Phoenix deadlock troubleshooting - Phoenix out-of-memory troubleshooting
馃挕聽Hint: Mention RunLLM in the thread for followups.
To my knowledge, Phoenix is fully robust to horizontal scaling. Tagging 馃敀[private user] in case he has any insight here.
for instance our ingest performance does not scale well with node count due to backend limitations
and in some limited occasions you might even run into ingestion problems, though we believe those cases are rare
Ah ok interesting. We are having ingest problems where large volumes of spans are causing our instance to become unavailable. We see memory maxing out when this happens (4gb 2vcpu)
My assumption is the batches are queued in memory for processing and it isn't processing fast enough leading to this backup. But maybe the problem is elsewhere?
hey Peter K. this is an ongoing issue we鈥檙e investigating, currently spans can get backed up because ingest is performance limited by multiple mechanisms and we鈥檙e investigating ways to improve it. we recently added a limit on the ingestion queue to prevent the server from running out of memory in these cases, can you verify that your phoenix is on the latest version?
Ah yep that explains it. We updated today. We were not getting this error before
Interestingly we weren't filling up our memory before either though 馃
