Scaling Phoenix for High Volume: Strategies to Handle 200M+ Spans and Large Data Sets
Hey! We've been using phoenix in production for about 6 months or so. Is there anything folks are doing to enable larger scale? We peaked at about 200M spans and a little over 2TB, and our phoenix deployment is more or less non functional at this point. We're running 13.18 after an upgrade that I did recently. As an attempt to resolve, I pruned ~100M spans that were low value. We already have data retention policies in place. Are there common approaches folks are using to enable using phoenix at this kind of scale? As an aside, I saw above that others also hit some issues on some larger scale spans tables during the upgrade from 12->13 with the migration taking out a lock on the spans table. I got through that by finding the create index statement that was attempting to run, scaling in all of the service takes to remove write contention, and then scaling up my maintenance work mem and parallel worker count and executing the index creation.
