I'm curious what the procedure is to handle database migration errors? The phoenix container is distroless, we can't shell in and run an alembic command. I'm really nervous about this breaking our production instance when someone upgrades the container and doesn't realize there are breaking migrations.
What would be ideal is a way to manually run the migration to an offline replica first
Phoenix runs database migrations automatically at startup, but if a migration fails and the container is distroless (no shell access), you must clone the Phoenix repository locally, navigate to src/phoenix/db, set the PHOENIX_SQL_DATABASE_URL environment variable, and run Alembic commands (e.g., alembic upgrade head or alembic stamp head) from your local environment to manually resolve migration issues. This is the only supported way to recover from a dirty migration state when you can't shell into the container directly, as described in the official docs and repo instructions. For production, always pin the Phoenix version and test migrations in a staging environment before upgrading to avoid unexpected breaking changes or downtime. If you need a shell for debugging or manual migration inside a container, you can build and run the arizephoenix/phoenix:latest-debug image, which includes a shell, or modify the Dockerfile to use a debug base image. For persistent issues, reach out to the Arize community Slack for support. Would you like more detail on the manual migration steps or Docker debug image usage? Sources:
- Phoenix DB README: Manual Migrations
- Phoenix Issue: Migration Errors & Alembic CLI
馃挕聽Hint: Mention RunLLM in the thread for followups.
That's not really something we can tolerate in production
Curious how migrations are tested before released?
Gotcha. Apologies, this is new territory for me -- our database reliability policy has always been - do not use schema migration tools 馃檪
because of this exact reason
It certainly is a tricky thing and we try to not do migrations often other than for net new functionality. Most migrations are entirely new tables. We also only do migration on major bumps to give a heads up and they are documented here: https://github.com/Arize-ai/phoenix/blob/main/MIGRATION.md Hope that alleviates some fears 馃拃
you could certainly attach a new phoenix version on a replica to test out the migration before you move it on to the primary. Not sure if that works.
also we do have debug builds so you can shell into those
Yea, this definitely helps
I am unable to fix this migration so far tho
i tried alembic stamp head, alembic upgrade head
The Phoenix version and full stacktrace when booting Phoenix would help.
