Hello π Request: looks like the latest release of phoenix included a new index for session ids, neat! https://github.com/Arize-ai/phoenix/pull/11028 However, our spans table is 712GB and the index is not created concurrently which locked the table and blew everything up. Can we force that on new PRs going forward (where it makes sense of course) Working on killing and manually creating this stuff so currently
The new session.id index for spans in Phoenix is not created concurrently, which can lock large tables and disrupt operations. The current migration does not use the CONCURRENTLY option for index creation, so table locking is expected for large datasets. There is no documented way to force concurrent index creation in Phoenix migrations as of now; this would require a change in future PRs to use CREATE INDEX CONCURRENTLY where appropriate for PostgreSQL, but this is not yet implemented or configurable in the current codebase or migration scripts according to the referenced PR and its discussion.Would you like more detail on how to manually create the index concurrently or guidance on best practices for large table migrations?Sources: - feat(db): add session.id index for spans across sqlite and postgres
π‘Β Hint: Mention RunLLM in the thread for followups.
Hey Peter, so sorry you hit that. We rolled it in to a major version release thinking it was safe given it was a big change but I guess we didn't quite know it would lock things up like that. Let us know if there's anything we can do to help. In the future we'll be extra careful here.
Hey Peter, glad youβre back up and running. Wanted to share some more context and a workaround for anyone else who hits this. Why CONCURRENTLY wasnβt used in the migration: CREATE INDEX CONCURRENTLY cannot run inside a transaction, and Alembic (our migration tool) wraps each migration in one. Workaround for large databases: If you need to get past the migration without the table lock, you can pre-create a no-op index with the same name before running the migration. Since the migration uses IF NOT EXISTS, it checks the index name and skips if it already exists:
CREATE INDEX CONCURRENTLY IF NOT EXISTS ix_spans_session_id
ON spans ((attributes #>> '{session,id}'))
WHERE false;The WHERE false makes it instant β PostgreSQL recognizes the predicate is always false and skips the heap scan entirely, so the index is created in milliseconds regardless of table size. Once the migration sees the name exists, itβs a no-op. You can then backfill the real index at your leisure:
DROP INDEX CONCURRENTLY IF EXISTS ix_spans_session_id;
CREATE INDEX CONCURRENTLY IF NOT EXISTS ix_spans_session_id
ON spans (((attributes #>> '{session,id}')::varchar))
WHERE ((attributes #>> '{session,id}')::varchar) IS NOT NULL;Sorry again for the disruption, and thanks for the callout.
Ah, looks like it should be possible https://github.com/sqlalchemy/alembic/issues/277#issuecomment-2948601425
Thanks for the link! That autocommit_block approach is definitely cleaner than the raw COMMIT workaround. That said, CONCURRENTLY only avoids the table lock. It still does a full table scan to build the index, so on a 712GB table it would still take a long time. And since Phoenix runs migrations on startup, the app blocks until the build finishes either way, regardless of locking. For large databases, the recommended approach is to pre-create the index while the old version is still running (so ingestion continues), then upgrade Phoenix. The migration uses IF NOT EXISTS, so it skips if the index name already exists, and the app starts instantly. Weβre adding step-by-step instructions for that to the migration guide: https://github.com/Arize-ai/phoenix/blob/main/MIGRATION.md
Hrm I guess most of my experience is in rails but there, when you run concurrently, it moves to the next migration since concurrent is an async operation. Perhaps a thing to test
I would assume thats how would work for alembic as well since the postgres driver should just return when the index creation is submitted
We keep our phoenix version updated automatically so might just have to make that a manual process then for now I suppose
Good idea β we went ahead and tested it. Turns out CONCURRENTLY is synchronous to the calling connection. The βconcurrentlyβ part means it doesnβt lock the table for other sessions, but the connection that issued it still waits for the whole build to finish. So the Postgres driver doesnβt just return; it blocks until the index is ready. Hereβs what we saw on a 5.7M-row / 11 GB spans table, measured from Phoenix server startup (on M3 MacBook Pro):
Approach Time Table locked?
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Original migration (no CONCURRENTLY) ~11s Yes
Migration with CONCURRENTLY ~26s No
Pre-create WHERE false idx + migration <1s NoSince Phoenix runs migrations on startup, the app blocks until the build finishes either way. CONCURRENTLY just avoids the table lock.
Ah ok. So it seems like if it did concurrently, nothing would change on startup except the existing instance wouldn't be hosed while the migration is running
Seems better-ish? concurrently will use more CPU but it seems worth it
We are probably a rare use case as we are a code review tool. That 700GB of spans is only 7 days of data for us (we have retention only to 7 days).
Would be cool to have like an elasticsearch or redis backend option for the high volume use cases
