I have self-hosted phoenix instance where multiple services are sending traces to. Now I'd like to do some analysis on some of the traces, but APIs like px.Client().get_spans_dataframe() seems heavy on the node, esp. for exporting large amount of traces. It feels analytics workload like (px.Client().get_spans_dataframe()) and tracing workload (tracer.start_as_current_span) should be sent to separate nodes somehow, I wonder if anyone has any advice/experience on this topic?
馃挕聽Hint: Mention RunLLM in the thread for followups.
Hey Zhuyi X. - it's true that depending on your use-case the px.Client can over-fetch spans. It's why we added https://arize-phoenix.readthedocs.io/projects/client/en/latest/api/client.html#spans Which has cursor based pagination. It should lighten the load. We do have plans about things like sampling / complex filtering too but it's not yet implemented so let us know your use-case if you have one!
my company does support export CDC (change data capture) to data warehouse and we can then query the traces via SQL. The downside is that then we won't be able to use APIs from px.Client(). But my hunch is still that exporting the data somewhere else for analytics is probably better (this is also suggested by the bot) This feels typical of what OLTP (Online Transaction Processing) vs. OLAP (Online Analytical Processing) is about.
That works too 馃檪 Just for clarity we have a much more light weight client (https://pypi.org/project/arize-phoenix-client/) this is different from px.Client as we are migrating. It taps into this API https://arize.com/docs/phoenix/sdk-api-reference/rest-api/spans
Just for clarity we have a much more light weight client (https://pypi.org/project/arize-phoenix-client/)
to make sure I understand, light here just means the package is small, right? but when calling phoenix for exporting or processing large amounts of data, using px.Client or arize-phoenix-client won't make much difference, right? as afterall, they're calling the same phoenix server that also accept traces from different services.
So worth a try
I see. Thanks for this insights!
I may try again if I find the client API much more useful than SQL.
but sql / client is the most stable
