Phoenix self-hosted PostgreSQL storage bloat and VACUUM issues help
Hi, we are running a self-hosted Phoenix instance and experiencing a critical database storage issue where our PostgreSQL database has grown to 291GB, with the vast majority being external storage that is not reclaimed by either the Data Retention Policy or manual VACUUM operations. Environment Details: - Phoenix Version: 12.11.1 - Deployment: Google Cloud Run with external PostgreSQL database (via private IP) - Database: PostgreSQL (Google Cloud SQL) - Current Data: ~720,000 frontend traces and ~650,000 backend traces Database Size Breakdown: - Total database size: 291GB - `spans` table total size: 290GB - Table size (actual rows): 13GB - External size (TOAST/indexes/etc.): 277GB (95% of total) Issue Description: Despite configuring and running the Data Retention Policy to delete old traces, the database continues to grow. The analysis reveals that: 1. The actual `spans` table data is only 13GB 2. However, 277GB is consumed by "external storage" - likely TOAST tables, deleted row space, or indexes 3. This external storage is not being reclaimed after old data is deleted by the retention policy 4. We have already manually run VACUUM operations, but they did not reduce the database size 5. We were unable to query index sizes due to query errors, suggesting potential issues with index bloat Critical Questions: 1. Why would VACUUM operations fail to reclaim the 277GB of external storage? Is this expected behavior? 2. Does Phoenix store trace data in a way that causes TOAST table bloat that cannot be reclaimed by standard VACUUM? 3. Is there a known issue with how Phoenix deletes old spans that prevents PostgreSQL from reclaiming space? 4. Would `VACUUM FULL` be required instead of regular `VACUUM`? If so, are there any risks or special considerations for running it on the `spans` table? 5. Could this be related to long-running transactions or other PostgreSQL configuration issues specific to how Phoenix uses the database? 6. Are there any Phoenix-specific maintenance procedures we should be following beyond standard PostgreSQL maintenance? What We've Tried: - Configured Data Retention Policy - Manually ran VACUUM operations (did not reduce size) - Analyzed table sizes to identify the issue Impact: We're looking to understand how to properly maintain the database in a self-hosted deployment and reclaim this storage, as we're consuming significantly more storage (~291GB) than the actual data size (13GB). We would appreciate guidance on resolving this storage issue. Thank you for your assistance.
