phoenix.Client.query_spans perf issue 馃У
Performance issues with phoenix.Client.query_spans are known, especially as data volume grows. Recent improvements include lazy loading, reducing page size, and optimizing SQL queries, but large projects or traces with many spans can still cause slowdowns. Upgrading to the latest Phoenix version is recommended, as it includes these optimizations and further performance fixes for span querying and data extraction (release notes, GitHub Issue #4006).If you still experience slowness, consider filtering by time range or project, and use root spans only when possible to reduce data size (docs). Would you like more detail on specific performance tuning or code examples?Sources:
- GitHub Issue #4006
- release notes
馃挕聽Hint: Mention RunLLM in the thread for followups.
Hey guys, I've got a metadata field conversation_id attached to our spans. When I filter using this metadata in the Phoenix UI, it quickly finds and displays the relevant spans. However, when I use the same filter through the phoenix.Client, the queries are timing out. Are these two queries different behind the scenes? I'm running this on our production Phoenix instance, which currently has:
Spans table: 15,701,720 rows
Traces table: 179,536 rows
Even though that's quite a lot of spans, I'm curious why the UI returns results quickly, but the phoenix.Client doesn't. Here's the Python code I'm using:
start_date = datetime.datetime.now() - datetime.timedelta(days=7)
end_date = datetime.datetime.now()
where = "metadata['conversation_id'] == 'bce3eefe-4f8d-4a56-82b0-a3c7d4c8fe3a'"
print(f"Querying {where}")
spans = phoenix_client.query_spans(
SpanQuery().where(where),
start_time=start_date,
end_time=end_date,
root_spans_only=True,
timeout=60 * 5,
)Checking the AWS console for top queries, I found this query (I believe it's related):
SELECT spans.name, spans.span_kind, spans.parent_id, spans.start_time, spans.end_time,
spans.status_code, spans.status_message, spans.events,
spans.span_id AS "context.span_id", traces.trace_id AS "context.trace_id",
spans.attributes
FROM spans
JOIN traces ON traces.id = spans.trace_rowid
JOIN projects ON projects.id = traces.project_rowid
WHERE projects.name = $1::VARCHAR
AND CAST((spans.attributes #>> $2) AS VARCHAR) = $3::VARCHAR
LIMIT $4::INTEGERI'm not completely sure if this exact query matches mine, but it seems related. Also, it appears it might not be using the date range filters (start_time and end_time). Does anyone know how the UI and client queries differ, and why there's such a performance gap?
I鈥檓 still working on this issue, can i ask whether you鈥檙e dealing with orphan spans in this situation?
we released an update in arize-phoenix 8.32.1. would you like to give that a try and let us know if it has improved?
if you don鈥檛 need orphan spans, there鈥檚 now an extra parameter you can set to False
orphan_span_as_root_span=False,Hey Roger, thanks for looking into this. I think we don't have orphan spans anymore as our current implementation removes those relationships. We're still not using the orphan_span_as_root_span flag, but our current implementation don't send orphan spans to phoenix anymore.
ok got it. setting that flag to False would still speed up the query because it writes a different query for it
ok, I'll make sure to use it then... But will that have an instant effect or will I have to wait until I collect enough spans in that mode?
oh
wait, I see now, this is to pass in the Query
I thought this was a config in the tracer
right. if you know that the spans you鈥檙e looking for aren鈥檛 orphans, then you can set it to False
spans = phoenix_client.query_spans(
SpanQuery().where(where),
start_time=start_date,
end_time=end_date,
root_spans_only=True,
orphan_span_as_root_span=False,
timeout=60 * 5,
)in my testing using 20 million spans, this cuts the runtime by half
That didn't work for me. It is still timing out through phoenix.Client while in the UI the same query is very fast. I changed our solution to query through the trace_id . It performs way faster. It would be more convenient if we could use our id though, but it's ok as can proceed with the trace_id.
