Hello, I am reaching out to report some technical inconsistencies and tracing issues we are encountering on the Arize platform tracing ADK agents. The behavior changes significantly depending on the version of the Google SDK / OpenInference instrumentation we are testing with ADK web. Using the last version of ADK results on major inconcistencies in tracing (orphan traces, randomed ordered trace tree) let me give you some details
Below is a detailed breakdown of our observations regarding versions 1.21.0, 1.22.1, and 1.23.0+. 1. Arize with ADK v1.21.0 Everything is working fine with this version of ADK 1. Observations on v1.21.0 vs v1.22.1 (Intermittent Orphan Spans) Overall, tracing works well on version 1.22.1. However, we have noticed that depending on the CPU load, the level of asynchronism, and the number of parallel requests, we occasionally end up with orphan spans in Arize. The main architectural difference we identified is that v1.22.1 forces PROGRESSIVE_SSE_STREAMING, whereas this was disabled in v1.21.0. That being said, my latest isolated test using ADK web + v1.22.1 with PROGRESSIVE_SSE_STREAMING and StreamingMode=SSE worked perfectly, and the traces were correctly stitched in Arize. Despite this successful test, we still recorded a few orphan spans yesterday under real-world load conditions. 2. Observations on v1.23.0 and above (Broken traces due to instrumentation limitations) For version 1.23.0 and higher, the issue seems to be completely different and more systemic. Google has introduced a new tracing.tracer in the core SDK (which is fundamentally a good architectural update). However, it appears that the openinference-instrumentation-google-adk module has not caught up with this change. Currently, the OpenInference module only targets and replaces base_llm_flow.tracer. We strongly suspect that this partial patching is the root cause of the side effects and broken traces we are experiencing in versions 1.23.0 and above.
