phoenix.Client.query_spans perf issue 🧵

RunLLM
APP
·
Performance issues with `phoenix.Client.query_spans` are known, especially as data volume grows. Recent improvements include lazy loading, reducing page size, and optimizing SQL queries, but large projects or traces with many spans can still cause slowdowns. Upgrading to the latest Phoenix version is recommended, as it includes these optimizations and further performance fixes for span querying and data extraction (<https://docs.arize.com/phoenix/release-notes/|release notes>, <https://github.com/Arize-ai/phoenix/issues/4006|GitHub Issue #4006>). If you still experience slowness, consider filtering by time range or project, and use root spans only when possible to reduce data size (<https://docs.arize.com/phoenix/tracing/how-to-tracing/importing-and-exporting-traces/extract-data-from-spans#how-to-specify-a-time-range|docs>). Would you like more detail on specific performance tuning or code examples? Sources: - <https://github.com/Arize-ai/phoenix/issues/4006|GitHub Issue #4006> - <https://docs.arize.com/phoenix/release-notes/|release notes>
💡 Hint: Mention RunLLM in the thread for followups.
Arthur M.
·
Hey guys, I've got a metadata field conversation_id attached to our spans. When I filter using this metadata in the Phoenix UI, it quickly finds and displays the relevant spans. However, when I use the same filter through the phoenix.Client, the queries are timing out. Are these two queries different behind the scenes? I'm running this on our production Phoenix instance, which currently has:
Spans table: 15,701,720 rows
Traces table: 179,536 rows
Even though that's quite a lot of spans, I'm curious why the UI returns results quickly, but the phoenix.Client doesn't. Here's the Python code I'm using:
start_date = datetime.datetime.now() - datetime.timedelta(days=7) end_date = datetime.datetime.now() where = "metadata['conversation_id'] == 'bce3eefe-4f8d-4a56-82b0-a3c7d4c8fe3a'" print(f"Querying {where}") spans = phoenix_client.query_spans( SpanQuery().where(where), start_time=start_date, end_time=end_date, root_spans_only=True, timeout=60 * 5, )
Checking the AWS console for top queries, I found this query (I believe it's related):
SELECT spans.name, spans.span_kind, spans.parent_id, spans.start_time, spans.end_time, spans.status_code, spans.status_message, spans.events, spans.span_id AS "context.span_id", traces.trace_id AS "context.trace_id", spans.attributes FROM spans JOIN traces ON traces.id = spans.trace_rowid JOIN projects ON projects.id = traces.project_rowid WHERE projects.name = $1::VARCHAR AND CAST((spans.attributes #>> $2) AS VARCHAR) = $3::VARCHAR LIMIT $4::INTEGER
I'm not completely sure if this exact query matches mine, but it seems related. Also, it appears it might not be using the date range filters (start_time and end_time). Does anyone know how the UI and client queries differ, and why there's such a performance gap?
Roger Y.
·
let me do some investigation on our end. thanks for bringing this to our attention
1
Roger Y.
·
I’m still working on this issue, can i ask whether you’re dealing with orphan spans in this situation?
Roger Y.
·
we released an update in arize-phoenix 8.32.1. would you like to give that a try and let us know if it has improved?
Roger Y.
·
if you don’t need orphan spans, there’s now an extra parameter you can set to False
orphan_span_as_root_span=False,
Arthur M.
·
Hey Roger, thanks for looking into this. I think we don't have orphan spans anymore as our current implementation removes those relationships. We're still not using the orphan_span_as_root_span flag, but our current implementation don't send orphan spans to phoenix anymore.
Roger Y.
·
ok got it. setting that flag to False would still speed up the query because it writes a different query for it
Arthur M.
·
ok, I'll make sure to use it then... But will that have an instant effect or will I have to wait until I collect enough spans in that mode?
Arthur M.
·
oh
Arthur M.
·
wait, I see now, this is to pass in the Query
Arthur M.
·
I thought this was a config in the tracer

right. if you know that the spans you’re looking for aren’t orphans, then you can set it to False

spans = phoenix_client.query_spans(
    SpanQuery().where(where),
    start_time=start_date,
    end_time=end_date,
    root_spans_only=True,
    orphan_span_as_root_span=False,
    timeout=60 * 5,
)

Roger Y.
·
in my testing using 20 million spans, this cuts the runtime by half
Arthur M.
·
That didn't work for me. It is still timing out through phoenix.Client while in the UI the same query is very fast. I changed our solution to query through the trace_id . It performs way faster. It would be more convenient if we could use our id though, but it's ok as can proceed with the trace_id.

phoenix.Client.query_spans perf issue 🧵

19 comments

phoenix.Client.query_spans perf issue 🧵

19 comments