Challenges Integrating Phoenix OSS with OpenTelemetry for Dual Backend Tracing
Hi everyone, forgive me for the looong text. 😅 I’ve been exploring different ways to integrate a test project with Phoenix OSS using OpenTelemetry to identify how my team can use Phoenix effectively in paralell with other open telemetry backend used to monitor the application. So I built a "pet project" used to test different ways to feed both backends simultaneously: Phoenix (for tracing and evals) and Jaeger (for general application and infrastructure monitoring). After some trial and error, and with help from 🔒[private user] and Anthony P., I ended with two integration approaches:
- 1.
Single Global Context: Using a single global context configured to feed two distinct trace providers (required to handle the separate backend addresses).
- 2.
Manual Separation: Manually creating separate tracers, contexts with distinct spans, one for each backend and manually picking which span to use in each method.
Ultimately, I couldn't get Phoenix to behave as expected in either case. I’d like to confirm if I’m missing something or if these are current practical limitations in current UI/UX. Starting with strategy #2: While manually managing separate contexts works, it is much more complex and prone to errors. I couldn't send/capture all spans properly when using frameworks like strandsagents or LiteLLM proxy, which plug directly into the global OpenTelemetry SDK. Therefore, I’m ruling this out as a viable solution. The issue with strategy #1 (Single Global Context / Dual Providers): This approach works well only if I initialize telemetry exactly when the code starts agent tasks. Phoenix seems to expect OpenInference attributes to be present on the root span (please see the attached images). However, in a "real-world" application, the root span is created with application data (e.g., an HTTP handler or Lambda entry point) before any prompt, LLM, or agent data is captured. The spans containing the OpenInference info will always be child spans, never the root. To illustrate this, I ran a "dialogue" via a pytest test (where the agent invocation produces the root span) and the same dialogue via the "application" (a Lambda function, where the agent is a child span from the application span): pytest scenarios (Agent = Root Span):
phoenix-pytest-traces.png: "status", "input", and "output" columns contain proper values as expected.
phoenix-pytest-sessions.png: Sessions are tracked properly with the first and last input.
phoenix-pytest-session-details.png: User can see the AI and Human input/outputs and easily annotate from this screen.
phoenix-pytest-trace-spans.png: This shows the OpenInference attributes are present in the root span.
Application scenarios (Agent = Child Span):
phoenix-app-traces.png: "status", "input", and "output" are blank/undefined.
phoenix-app-sessions.png: Sessions are tracked, but first/last inputs are blank.
phoenix-app-session-details.png: Nothing is shown; this screen becomes useless for annotation.
phoenix-app-trace-spans.png: This shows the root span is an "extra span" (created by the app code) without OpenInference attributes. Inspecting the second span shows it contains the data (identical to the root span in the pytest scenario), but the UI doesn't seem to pick it up.
My Questions:
How are you handling telemetry in production projects where the Agent is not the root span?
How do you monitor the application (infrastructure) while simultaneously using Phoenix for LLM tracing/evals without these UI issues?
Given the need for distinct addresses, is my setup of feeding two trace providers from a global context the correct way to handle this?
Any insights would be appreciated, and big thanks in advance to all (and in particular to 🔒[private user] and Anthony P. for their help)!
