Issues with Non-Concurrent Index Creation on Large Phoenix Spans Table and Proposal for Future PRs

Peter K. · 2026-02-17T18:39:04.050Z

Hello 👋 Request: looks like the latest release of phoenix included a new index for session ids, neat! https://github.com/Arize-ai/phoenix/pull/11028 However, our spans table is 712GB and the index is not created concurrently which locked the table and blew everything up. Can we force that on new PRs going forward (where it makes sense of course) Working on killing and manually creating this stuff so currently

RunLLM
APP
·
·
The new session.id index for spans in Phoenix is not created concurrently, which can lock large tables and disrupt operations. The current migration does not use the CONCURRENTLY option for index creation, so table locking is expected for large datasets. There is no documented way to force concurrent index creation in Phoenix migrations as of now; this would require a change in future PRs to use CREATE INDEX CONCURRENTLY where appropriate for PostgreSQL, but this is not yet implemented or configurable in the current codebase or migration scripts according to the referenced PR and its discussion. Would you like more detail on how to manually create the index concurrently or guidance on best practices for large table migrations? Sources: - <https://github.com/Arize-ai/phoenix/pull/11028|feat(db): add session.id index for spans across sqlite and postgres>
💡 Hint: Mention RunLLM in the thread for followups.
Mikyo
·
·
Hey Peter, so sorry you hit that. We rolled it in to a major version release thinking it was safe given it was a big change but I guess we didn't quite know it would lock things up like that. Let us know if there's anything we can do to help. In the future we'll be extra careful here.
Peter K.
·
·
No problem! We've got it going now but just would be a nice rule on the PR check for cursor bot
🙌1
Mikyo
·
·
Oh phew, and yeah good call-out. The team knows to be more careful with these index changes going forward. We knew it would take a bit of time but didn't quite anticipate the lock up. Our bad.
1
Roger Y.
·
·
Hey Peter, glad you’re back up and running. Wanted to share some more context and a workaround for anyone else who hits this. Why CONCURRENTLY wasn’t used in the migration: CREATE INDEX CONCURRENTLY cannot run inside a transaction, and Alembic (our migration tool) wraps each migration in one. Workaround for large databases: If you need to get past the migration without the table lock, you can pre-create a no-op index with the same name before running the migration. Since the migration uses IF NOT EXISTS, it checks the index name and skips if it already exists:
CREATE INDEX CONCURRENTLY IF NOT EXISTS ix_spans_session_id ON spans ((attributes #>> '{session,id}')) WHERE false;
The WHERE false makes it instant — PostgreSQL recognizes the predicate is always false and skips the heap scan entirely, so the index is created in milliseconds regardless of table size. Once the migration sees the name exists, it’s a no-op. You can then backfill the real index at your leisure:
DROP INDEX CONCURRENTLY IF EXISTS ix_spans_session_id; CREATE INDEX CONCURRENTLY IF NOT EXISTS ix_spans_session_id ON spans (((attributes #>> '{session,id}')::varchar)) WHERE ((attributes #>> '{session,id}')::varchar) IS NOT NULL;
Sorry again for the disruption, and thanks for the callout.
Peter K.
·
·
Ah, looks like it should be possible https://github.com/sqlalchemy/alembic/issues/277#issuecomment-2948601425
Roger Y.
·
·
Thanks for the link! That autocommit_block approach is definitely cleaner than the raw COMMIT workaround. That said, CONCURRENTLY only avoids the table lock. It still does a full table scan to build the index, so on a 712GB table it would still take a long time. And since Phoenix runs migrations on startup, the app blocks until the build finishes either way, regardless of locking. For large databases, the recommended approach is to pre-create the index while the old version is still running (so ingestion continues), then upgrade Phoenix. The migration uses IF NOT EXISTS, so it skips if the index name already exists, and the app starts instantly. We’re adding step-by-step instructions for that to the migration guide: https://github.com/Arize-ai/phoenix/blob/main/MIGRATION.md
Peter K.
·
·
Hrm I guess most of my experience is in rails but there, when you run concurrently, it moves to the next migration since concurrent is an async operation. Perhaps a thing to test
Peter K.
·
·
I would assume thats how would work for alembic as well since the postgres driver should just return when the index creation is submitted
Peter K.
·
·
We keep our phoenix version updated automatically so might just have to make that a manual process then for now I suppose
Roger Y.
·
·
Good idea — we went ahead and tested it. Turns out CONCURRENTLY is synchronous to the calling connection. The “concurrently” part means it doesn’t lock the table for other sessions, but the connection that issued it still waits for the whole build to finish. So the Postgres driver doesn’t just return; it blocks until the index is ready. Here’s what we saw on a 5.7M-row / 11 GB spans table, measured from Phoenix server startup (on M3 MacBook Pro):
Approach Time Table locked? ──────────────────────────────────────────────────────────────── Original migration (no CONCURRENTLY) ~11s Yes Migration with CONCURRENTLY ~26s No Pre-create WHERE false idx + migration <1s No
Since Phoenix runs migrations on startup, the app blocks until the build finishes either way. CONCURRENTLY just avoids the table lock.
Peter K.
·
·
Ah ok. So it seems like if it did concurrently, nothing would change on startup except the existing instance wouldn't be hosed while the migration is running
Peter K.
·
·
Seems better-ish? concurrently will use more CPU but it seems worth it
Peter K.
·
·
We are probably a rare use case as we are a code review tool. That 700GB of spans is only 7 days of data for us (we have retention only to 7 days).
Peter K.
·
·
Would be cool to have like an elasticsearch or redis backend option for the high volume use cases

Issues with Non-Concurrent Index Creation on Large Phoenix Spans Table and Proposal for Future PRs

17 comments

Issues with Non-Concurrent Index Creation on Large Phoenix Spans Table and Proposal for Future PRs

17 comments