Phoenix is not optimized for very large-scale deployments (hundreds of millions of spans, multi-terabyte databases) in its open-source form. Common issues at this scale include slow queries, UI timeouts, and database contention, especially with large spans tables and during migrations. Recommended approaches include: (1) aggressive data retention policies to keep the database size manageable, (2) regular pruning of low-value spans, (3) scaling up database resources (CPU, memory, parallel workers), and (4) horizontal scaling of Phoenix containers behind a load balancer, all backed by a single Postgres instance. For even larger workloads or OLAP use cases, Arize AX (the enterprise product) uses a proprietary database designed for high-volume analytics, which is not available in open-source Phoenix. There is no documented support for sharding or distributed Postgres in open-source Phoenix, and performance bottlenecks are expected at this scale without moving to a more specialized backend or product.
Recent improvements in Phoenix include UI lazy loading, pagination, and query optimizations, but these only partially mitigate issues with very large datasets. For production, it's recommended to monitor memory and disk usage, tune Postgres (indexes, query plans, connection pooling), and consider splitting workloads across multiple Phoenix instances if organizationally feasible. If you continue to experience performance issues, you may be hitting the architectural limits of open-source Phoenix and Postgres for this use case. Would you like more detail on specific tuning steps or architectural alternatives?
Sources:
- <https://arize.com/docs/phoenix/self-hosting/architecture|Phoenix architecture and scaling>
- <https://arize.com/docs/phoenix/production-guide|Production guide: scaling and resource planning>
- <https://github.com/Arize-ai/phoenix/issues/6201|UI slowdowns>
- <https://github.com/Arize-ai/phoenix/issues/8420|database tuning>