Optimizing Trace Analysis in Self-Hosted Phoenix Instances

·Jul 30, 2025 06:46 PM

I have self-hosted phoenix instance where multiple services are sending traces to. Now I'd like to do some analysis on some of the traces, but APIs like px.Client().get_spans_dataframe() seems heavy on the node, esp. for exporting large amount of traces. It feels analytics workload like (px.Client().get_spans_dataframe()) and tracing workload (tracer.start_as_current_span) should be sent to separate nodes somehow, I wonder if anyone has any advice/experience on this topic?

11 comments

· Sorted by Oldest

RunLLM
APP
·
It seems likely that Phoenix does not natively support separating analytics (e.g., `px.Client().get_spans_dataframe()`) and tracing workloads onto different nodes out of the box. However, you can mitigate performance issues by running analytics queries on a replica or a separate read-only database, or by exporting traces for offline analysis, rather than querying large datasets directly from the production node. This approach is not explicitly documented, but users have reported UI and API slowdowns with large trace volumes and have addressed them by optimizing database usage and offloading heavy queries elsewhere. Upgrading Phoenix to the latest version may also help, as recent releases include performance improvements for large trace sets and lazy loading in the UI (<https://github.com/Arize-ai/phoenix/issues/6201|GitHub Issue #6201>, <https://arize-ai.slack.com/archives/C04R3GXC8HK/p1724775125.127529|Slack discussion>). For exporting traces, you can use `px.Client().get_trace_dataset().save(directory=...)` to persist data for offline analysis, reducing load on your main Phoenix node (<https://arize.com/docs/phoenix/tracing/how-to-tracing/importing-and-exporting-traces/extract-data-from-spans|Export Data & Query Spans>). Running analytics on a separate environment or database replica is a common pattern in observability systems, though not directly described in Phoenix docs. Would you like more detail on how to set up such a workflow? Sources: - <https://github.com/Arize-ai/phoenix/issues/6201|GitHub Issue #6201> - <https://arize-ai.slack.com/archives/C04R3GXC8HK/p1724775125.127529|Slack discussion> - <https://arize.com/docs/phoenix/tracing/how-to-tracing/importing-and-exporting-traces/extract-data-from-spans|Export Data & Query Spans>
💡 Hint: Mention RunLLM in the thread for followups.
Mikyo
·
Hey Zhuyi X. - it's true that depending on your use-case the px.Client can over-fetch spans. It's why we added https://arize-phoenix.readthedocs.io/projects/client/en/latest/api/client.html#spans Which has cursor based pagination. It should lighten the load. We do have plans about things like sampling / complex filtering too but it's not yet implemented so let us know your use-case if you have one!
Zhuyi X.
·
my company does support export CDC (change data capture) to data warehouse and we can then query the traces via SQL. The downside is that then we won't be able to use APIs from px.Client(). But my hunch is still that exporting the data somewhere else for analytics is probably better (this is also suggested by the bot) This feels typical of what OLTP (Online Transaction Processing) vs. OLAP (Online Analytical Processing) is about.
Mikyo
·
That works too 🙂 Just for clarity we have a much more light weight client (https://pypi.org/project/arize-phoenix-client/) this is different from px.Client as we are migrating. It taps into this API https://arize.com/docs/phoenix/sdk-api-reference/rest-api/spans
Zhuyi X.
·
Just for clarity we have a much more light weight client (https://pypi.org/project/arize-phoenix-client/)
to make sure I understand, light here just means the package is small, right? but when calling phoenix for exporting or processing large amounts of data, using px.Client or arize-phoenix-client won't make much difference, right? as afterall, they're calling the same phoenix server that also accept traces from different services.
Mikyo
·
They actually use different APIs, one uses pyarrow and the new one uses the paginated rest api.
💡1
Mikyo
·
So worth a try
Zhuyi X.
·
I see. Thanks for this insights!
Zhuyi X.
·
I may try again if I find the client API much more useful than SQL.
Mikyo
·
If you want, you could venture to use graphQL, which is also there 🙂 It's pretty stable now
💡1
Mikyo
·
but sql / client is the most stable

RunLLM
APP
·
It seems likely that Phoenix does not natively support separating analytics (e.g., `px.Client().get_spans_dataframe()`) and tracing workloads onto different nodes out of the box. However, you can mitigate performance issues by running analytics queries on a replica or a separate read-only database, or by exporting traces for offline analysis, rather than querying large datasets directly from the production node. This approach is not explicitly documented, but users have reported UI and API slowdowns with large trace volumes and have addressed them by optimizing database usage and offloading heavy queries elsewhere. Upgrading Phoenix to the latest version may also help, as recent releases include performance improvements for large trace sets and lazy loading in the UI (<https://github.com/Arize-ai/phoenix/issues/6201|GitHub Issue #6201>, <https://arize-ai.slack.com/archives/C04R3GXC8HK/p1724775125.127529|Slack discussion>). For exporting traces, you can use `px.Client().get_trace_dataset().save(directory=...)` to persist data for offline analysis, reducing load on your main Phoenix node (<https://arize.com/docs/phoenix/tracing/how-to-tracing/importing-and-exporting-traces/extract-data-from-spans|Export Data & Query Spans>). Running analytics on a separate environment or database replica is a common pattern in observability systems, though not directly described in Phoenix docs. Would you like more detail on how to set up such a workflow? Sources: - <https://github.com/Arize-ai/phoenix/issues/6201|GitHub Issue #6201> - <https://arize-ai.slack.com/archives/C04R3GXC8HK/p1724775125.127529|Slack discussion> - <https://arize.com/docs/phoenix/tracing/how-to-tracing/importing-and-exporting-traces/extract-data-from-spans|Export Data & Query Spans>
💡 Hint: Mention RunLLM in the thread for followups.
Mikyo
·
Hey Zhuyi X. - it's true that depending on your use-case the px.Client can over-fetch spans. It's why we added https://arize-phoenix.readthedocs.io/projects/client/en/latest/api/client.html#spans Which has cursor based pagination. It should lighten the load. We do have plans about things like sampling / complex filtering too but it's not yet implemented so let us know your use-case if you have one!
Zhuyi X.
·
my company does support export CDC (change data capture) to data warehouse and we can then query the traces via SQL. The downside is that then we won't be able to use APIs from px.Client(). But my hunch is still that exporting the data somewhere else for analytics is probably better (this is also suggested by the bot) This feels typical of what OLTP (Online Transaction Processing) vs. OLAP (Online Analytical Processing) is about.
Mikyo
·
That works too 🙂 Just for clarity we have a much more light weight client (https://pypi.org/project/arize-phoenix-client/) this is different from px.Client as we are migrating. It taps into this API https://arize.com/docs/phoenix/sdk-api-reference/rest-api/spans
Zhuyi X.
·
Just for clarity we have a much more light weight client (https://pypi.org/project/arize-phoenix-client/)
to make sure I understand, light here just means the package is small, right? but when calling phoenix for exporting or processing large amounts of data, using px.Client or arize-phoenix-client won't make much difference, right? as afterall, they're calling the same phoenix server that also accept traces from different services.
Mikyo
·
They actually use different APIs, one uses pyarrow and the new one uses the paginated rest api.
💡1
Mikyo
·
So worth a try
Zhuyi X.
·
I see. Thanks for this insights!
Zhuyi X.
·
I may try again if I find the client API much more useful than SQL.
Mikyo
·
If you want, you could venture to use graphQL, which is also there 🙂 It's pretty stable now
💡1
Mikyo
·
but sql / client is the most stable