Hi friends, we are ingesting about 10k traces per day in our self-hosted Phoenix instance. We've noticed the homepage load of Phoenix is getting painfully slow with the default "Last 30 days" timeframe. In particular, the following query has an avg execution time of 30s, based on our PostgreSQL query insight:
SELECT
traces.project_rowid,
COALESCE(SUM(spans.llm_token_count_prompt), $1::INTEGER) AS prompt,
COALESCE(SUM(spans.llm_token_count_completion), $2::INTEGER) AS completion,
COALESCE(SUM(spans.llm_token_count_prompt), $1::INTEGER) + COALESCE(SUM(spans.llm_token_count_completion), $2::INTEGER) AS total
FROM
traces
JOIN
spans
ON
traces.id = spans.trace_rowid
WHERE
spans.start_time >= $3::TIMESTAMP
WITH
TIME ZONE
AND traces.project_rowid IN ($4::INTEGER,
$5::INTEGER,
$6::INTEGER)
GROUP BY
traces.project_rowidThis looks to be the stats query that populates the counters for each project panel on the homepage. Is there a way to optimize the query index for it to be efficient?
This slow query is a known issue in Phoenix when handling large datasets, as the join between traces and spans over millions of rows becomes expensive even with indexes. The Phoenix team recommends advanced indexing (such as composite or covering indexes on join/filter columns), query refactoring, or pre-aggregation strategies to improve performance, but there is no official documentation or silver bullet for this specific query. Limiting the time range, reducing the number of projects in the query, or pre-aggregating token counts per project and time window can help as workarounds (GitHub Issue #8068, GitHub Issue #4006).Recent Phoenix releases have made improvements like lazy loading, reducing page size, and selective attribute fetching, which may help if you upgrade. For deeper optimization, consider adding composite indexes on (trace_rowid, start_time) in spans and (project_rowid, id) in traces, and/or pre-aggregating token counts outside of the main query path (GitHub Issue #4006). Would you like more detail on index strategies or pre-aggregation approaches?Sources:
- GitHub Issue #8068
- GitHub Issue #4006
๐กย Hint: Mention RunLLM in the thread for followups.
If you tell me the right answer to this question, I'll make sure I don't make the same mistake twice!
Feedback
Type the right answer here...
โต Please 'enter' to submit
Hey Jo P. super sorry you are hitting this. This is a known issue that we are actively working towards resolving. Is it possible for you to share some anonymized stats for your largest projects such as # of traces, token count in your most expensive project in the last 30 days? This may help us replicate on our end and make further improvements
Thanks Anthony P.! Some rough stats for the last 30 days on our dominant project:
total traces: 200,000
total tokens: 200M (170M prompt, 30M completion)
are you using SQLite or PostgreSQL?
postgres
how would you characterize the depth of most of your traces?
our traces are mostly between 2 - 10 levels deep. I'd say the median is probably around 5
thanks ๐
And thank you so much for your help with this issue!
