Hello, Following our previous discussion about storage usage, I've reduced the data retention period and wanted to report the results. **Actions Taken:**
Reduced retention period from 60 days to 30 days
**Results:**
Traces reduced: 1.4M → 580K (58% reduction)
Spans reduced: 21.6M → 10M (54% reduction)
Date range: 31 days (retention policy working correctly)
Dead tuples: 0.02% (minimal)
**Problem:**
Database size: Still 292GB (TOAST: 270GB)
Expected size based on span count: 123GB (10M spans × 13KB average)
Actual size is 2.2x larger than expected
**Database Details:**
Total traces: 579,430
Total spans: 9,960,828
Database size: 292GB
Spans table total: 291GB
TOAST size: 270GB
Dead tuple %: 0.02%
Last autovacuum: Recent (working normally)
**Question:** It appears that while the data retention policy successfully deleted old traces/spans, the TOAST table space was not reclaimed. Based on our earlier discussion, I understand this is expected PostgreSQL behavior and VACUUM FULL would be required to reclaim the physical space. However, given that:
We're running Phoenix on Google Cloud Run with Cloud SQL
VACUUM FULL requires exclusive table locks
This is a production environment receiving continuous traces
What is the recommended approach to safely reclaim the 147GB of unused TOAST space? Are there:
Specific maintenance windows or procedures recommended for self-hosted Phoenix?
Alternative approaches that don't require full table locks?
Configuration options to prevent this issue from recurring after future retention cleanups?
Thank you for your guidance.
Roger Y. Thank you for the diagnostic queries. Here are the results: **Query 1 - Size Breakdown:** - toast_heap_size: 269 GB - toast_index_size: 3084 MB (3 GB) - heap_size: 13 GB - indexes_size: 4959 MB (5 GB) - toast_pct: 93.8% **Query 2 - Span Count:** - estimated_row_count: 21,557,095 **Query 3 - Average JSONB Size:** - avg_attributes_size: 11 kB - avg_events_size: 14 bytes - estimated_total_jsonb: 221 GB **Query 4 - Dead Tuples:** - dead_pct: 0.58% **Query 5 - Replication Slots:** - Empty (no replication slots) **Current Retention Policy:** At 07:00 PM, only on Sunday (UTC), 60 days Based on these results, it appears the 269GB is indeed live TOAST data from span attributes rather than reclaimable dead space. Our use case involves [RAG/LLM applications with large context windows / specific use case details], which would explain the 11KB average per span. Given this understanding, what would be the recommended approach to manage storage costs for our self-hosted deployment? Should we: 1. Reduce the retention period? 2. Implement span attribute filtering before ingestion? 3. Are there any compression options for TOAST storage? Thank you for helping us understand this issue.
Thank you for the detailed explanation and the documentation link. I appreciate the clarification that this is expected PostgreSQL behavior. However, I need to clarify an important point: **We have already run VACUUM FULL on the database, but the 277GB of external storage was not reclaimed.** Given this situation, I have the following questions: 1. **Why would VACUUM FULL fail to reclaim space?** - Are there specific conditions that would prevent VACUUM FULL from reclaiming TOAST table space? - Could long-running transactions be preventing space reclamation? How can we check for this? - Could there be active connections or sessions that are blocking the operation? 2. **Dead tuples and transaction IDs:** - Should we check for dead tuples that cannot be vacuumed due to old transaction IDs? - Are there specific PostgreSQL queries you recommend to diagnose why VACUUM FULL didn't reclaim space? 3. **Index bloat:** - We encountered errors when querying index sizes. Could bloated indexes be consuming the 277GB? - Would running REINDEX help in this situation? - Are there safe queries to check index bloat on the `spans` table? 4. **Alternative approaches:** - If VACUUM FULL cannot reclaim the space, are there other approaches we should consider? - Would recreating the `spans` table (e.g., via `CREATE TABLE ... AS SELECT` and swapping) be a viable option? **Current situation:** - Total database size: 291GB - `spans` table: 290GB total (13GB actual data, 277GB external storage) - ~1.4 million traces in the system - VACUUM FULL has been executed but did not reduce the size This is concerning because even the most aggressive standard PostgreSQL maintenance operation (VACUUM FULL) did not resolve the issue. Any guidance on diagnosing why VACUUM FULL failed to reclaim space would be greatly appreciated.
Hi, we are running a self-hosted Phoenix instance and experiencing a critical database storage issue where our PostgreSQL database has grown to 291GB, with the vast majority being external storage that is not reclaimed by either the Data Retention Policy or manual VACUUM operations. Environment Details: - Phoenix Version: 12.11.1 - Deployment: Google Cloud Run with external PostgreSQL database (via private IP) - Database: PostgreSQL (Google Cloud SQL) - Current Data: ~720,000 frontend traces and ~650,000 backend traces Database Size Breakdown: - Total database size: 291GB - `spans` table total size: 290GB - Table size (actual rows): 13GB - External size (TOAST/indexes/etc.): 277GB (95% of total) Issue Description: Despite configuring and running the Data Retention Policy to delete old traces, the database continues to grow. The analysis reveals that: 1. The actual `spans` table data is only 13GB 2. However, 277GB is consumed by "external storage" - likely TOAST tables, deleted row space, or indexes 3. This external storage is not being reclaimed after old data is deleted by the retention policy 4. We have already manually run VACUUM operations, but they did not reduce the database size 5. We were unable to query index sizes due to query errors, suggesting potential issues with index bloat Critical Questions: 1. Why would VACUUM operations fail to reclaim the 277GB of external storage? Is this expected behavior? 2. Does Phoenix store trace data in a way that causes TOAST table bloat that cannot be reclaimed by standard VACUUM? 3. Is there a known issue with how Phoenix deletes old spans that prevents PostgreSQL from reclaiming space? 4. Would `VACUUM FULL` be required instead of regular `VACUUM`? If so, are there any risks or special considerations for running it on the `spans` table? 5. Could this be related to long-running transactions or other PostgreSQL configuration issues specific to how Phoenix uses the database? 6. Are there any Phoenix-specific maintenance procedures we should be following beyond standard PostgreSQL maintenance? What We've Tried: - Configured Data Retention Policy - Manually ran VACUUM operations (did not reduce size) - Analyzed table sizes to identify the issue Impact: We're looking to understand how to properly maintain the database in a self-hosted deployment and reclaim this storage, as we're consuming significantly more storage (~291GB) than the actual data size (13GB). We would appreciate guidance on resolving this storage issue. Thank you for your assistance.
