We regularly get timeouts querying traces in the UI when using metadata filters. We've turned data retention down to 15 days but still if we expand beyond a 4 hour time period, we get timeouts. We're doing about 2-4 million traces a week so a bit heavy. The db is running on t4g.large We're planning to scale up the db further to see if it helps but wondering if there's anything else we can do? Seems like it'd be ideal to partition the spans table somehow. Our spans table is 407gb Or maybe something else?
💡 Hint: Mention RunLLM in the thread for followups.
🔒[private user] help here?
Wondering if the query is loading everything in mem as well. Haven't tracked down the query yet but this is what our memory does when we visit the dashboard
The current metadata hasn’t been optimized for querying on such a large database. If you have the ability to modify the database, you can try creating a specialized index to improve performance, if you know the specific key you’re searching for. For example:
CREATE INDEX CONCURRENTLY ix_spans_attributes_metadata_conversation_id
ON public.spans ((attributes #>> '{metadata,conversation_id}'::text[]))
WHERE (attributes #>> '{metadata,conversation_id}'::text[]) IS NOT NULL;The dashboard simultaneously executes several separate time-series queries grouped by time and other dimensions. The query planner is likely choosing a hash join on the trace table, which can lead to higher memory consumption during execution.
But i don’t have a precise explanation for the amount of memory that it is using.
Ok awesome, thanks! We'll try adding some indexes. Is there an environment variable to increase the timeouts in the dashboard? When we load data using phoenix-client, we set a higher timeout to avoid issues. Can't seem to find one for the UI
the UI doesn’t have a timeout per se. i think the timeout probably comes from your proxy or load balancer
Yep sorry. To clarify, query timeouts for the queries the dashboard is making
