·
·

For teams running agents in production: Have you seen retries or loops burn money before anyone noticed? I’m specifically interested in: - repeated retries on large contexts - agents failing to reach an end state - cases where traces/logs existed but still didn’t make the failure obvious One real example would help a lot.

·
·

Hello, I have installed phoenix on my openclaw instance and have been implementing tracing using otel. This is my current setup with phoenix running locally and diagnostics.otel configuration. As you can see, I am able to capture traces however, I am unable to fill out all the columns(input, output, tool calls, etc.). Can someone guide me as to what I am missing in my otel config to populate everything. Thanks!

2 attachments
·
·

Hi Anson M.! Great to connect. I am equally interested in that "messy route"—moving past the hype phase we find ourselves in into the actual mechanics of production reliability. I am relatively new to this path but diving in headfirst. My primary focus is on the value proposition within the African market. Right now, there is massive excitement across the continent, but a significant knowledge gap exists regarding how agents actually perform in production, which I guess is a global problem. I am specifically interested in the token costs, failure patterns, and success rates impact the broader business ecosystem. We need to move from the hype of a working demo to real metrics that justify the cost of implementation. I believe building trustworthy and operable systems is the only way to bridge that gap and create actual economic value here. Looking forward to learning from your experience in observability and failure handling! We can connect more to see how we can drive this forward!

·
·

I'm trying to do a cost-by-customer dashboard, but it's not quite working the way I want. I received these instructions:

You can define a custom metric for cost (e.g. SELECT SUM(cost) FROM MODEL), then add it to a dashboard as a line chart. From there, you can group by any dimension on your traces/spans. So if you're tagging each trace/span with a customer identifier, you can group by that dimension and get a graph that breaks down LLM cost by customer.

But the chart looks wrong - it's not displaying in dollars and cents, and I can't tell if the amounts are in the right units, or if the amounts themselves are correct. Alyx seems to be down, or I would just ask there. Is there a step I'm missing?

·
·

Hi all. I'm Anson, excited to be here. I’m an engineering leader working on AI products and have gotten increasingly interested in the practical side of making agents reliable in production: evals, observability, failure handling, durable workflows, and all the messy stuff that shows up after the demo works. I joined because I’m looking for people who care about the same thing: how to build agentic systems that are actually trustworthy, inspectable, and operable in the real world. Would love to hear what folks here are working on, especially if you’ve been deep in agent reliability, tracing, or evals.

·
·

For people working with agents in production: Have you seen cases where retries, tool loops, or context issues caused wasted cost or hard-to-debug failures? A short real example would be super helpful.

·
·

For people working with agents in production: Have you seen cases where retries, tool loops, or context issues caused wasted cost or hard-to-debug failures? A short real example would be super helpful.

·
·

Hey everyone! Julian V. here, currently deep-diving into Agent reliability and production-grade debugging. I'm obsessed with solving 'Day 3' failures where agents behave in dev but hit infinite loops or context drift in the wild. Joined this community because I believe Arize/Phoenix is the best place to find people actually dealing with these messy production traces. Looking forward to trading some 'war stories' about unstable agentic workflows

·
·

Hi everyone, I hope you're doing well. This might be a basic question (maybe too basic) , but our team is wondering whether there are any plans to support Python 3.14 in the coming months, or if there is an estimated timeline for when it will be available in the next version of Phoenix. We currently have several agents in place, along with observability already set up, and we are looking to move forward with further developments. However, we would like to use the new version of the Google ADK framework, which introduces compatibility issues with the Python versions currently supported. Any insights or updates would be greatly appreciated. Thank you in advance!

·
·

Aryan D. Hey Aryan! Glad forceFlush() fixed the annotation timing issue. To your follow-up — yes, you can annotate the LangGraph node span directly instead of creating a custom span. The challenge is that trace.getActiveSpan() doesn't work inside your node function because LangGraph's async scheduling breaks OTel context propagation from the OpenInference instrumentor. Here are two approaches: Option 1: Custom SpanProcessor (recommended) Register a SpanProcessor with your TracerProvider that captures span IDs as they're created. This sees every span the LangChain instrumentor produces:

import { SpanProcessor, ReadableSpan } from '@opentelemetry/sdk-trace-base';
class SpanIdCapture implements SpanProcessor {
  private spanMap = new Map<string, string>();
  onStart() {}
  onEnd(span: ReadableSpan) {
    this.spanMap.set(span.name, span.spanContext().spanId);
  }
  getSpanId(name: string) { return this.spanMap.get(name); }
  forceFlush() { return Promise.resolve(); }
  shutdown() { return Promise.resolve(); }
}
const capture = new SpanIdCapture();
tracerProvider.addSpanProcessor(capture);
// After your graph invocation completes:
const judgeSpanId = capture.getSpanId('YourJudgeNodeName');
// Use judgeSpanId as recordId in batchUpdateAnnotations

If the node can run multiple times (e.g. in a loop), key by both span name and trace ID, or collect into an array. Option 2: If using Python with BaseCallbackHandler The trick is to call trace.get_current_span() inside the callback method (where the instrumentor has set the OTel context), not inside your node function:

from opentelemetry import trace
class SpanCaptureHandler(BaseCallbackHandler):
    def __init__(self):
        self.span_ids = {}
    def on_chain_start(self, serialized, inputs, *, run_id, **kwargs):
        span = trace.get_current_span()
        if span and span.is_recording():
            ctx = span.get_span_context()
            name = serialized.get("name", "")
            self.span_ids[name] = format(ctx.span_id, '016x')

Then after invocation: handler.span_ids['your_judge_node'] gives you the recordId. A couple of reminders for the annotation call: Use the actual span start timestamp as startTime (not midnight) — annotations are resolved against a UTC-day partition, so the wrong day = silent miss Make sure recordId is lowercase hex, 16 chars (which it should be if sourced from OTel) Still call forceFlush() and add a short delay before annotating to ensure the span has landed in Arize Hope that helps!

·
·

Hi All, not sure if I am posting in the right channel. But here it goes. I started using phoenix 1 and half month back. I was able to install phoenix locally using uv pip install arise-phoenix and able to view the traces on localhost:6006. But now as I am doing the same using the latest version, I see 'Not Found' error in browser as well as phoenix logs when I hit the url localhost:6006. Please note that pushing the traces to phoenix on 6006 is not resulting in any error, just not able to view the traces. Appreciate any help in advance.

·
·

Hi, in the light of my previous question, I got the annotation working by calling forceFlush() before the mutation, thanks! But I was wondering if there is a way to annotate a specific LangGraph node from the trace instead of creating a custom span? I tried using trace.getActiveSpan() inside LangGraph node to fetch the judge node created by it but its not accessible. I tried implementing a BaseCallbackHandler attached to the graph invocation which filters by node name and tries to capture the span inside the callback context but that didnt work. Is there a better approach to annotate the LangChainInstrumentation node or capture its id? Appreciate any help I can get, thanks! 🙂

·
·

Hi I'm Lex, SWE of 12 years here. I'm just getting into AI and would love some great resources! Also, how does one gain the ability to post in the Arize News channel?

·
·

Hey! I have a LangGraph-based AI workflow and using the batchUpdateAnnotations mutation to automatically label spans when our LLM judge node fails (e.g. timeout, parse error). We create a custom OTel span via startActiveSpan, extract the span ID, and fire the annotation after span.end() using BatchSpanProcessor. The mutation consistently returns BatchUpdateAnnotationSuccess: true with the correct spanId and label, and the span itself is visible in the trace UI but the Annotations tab for that span is always empty. Here is the exact payload being sent:

{
  "input": {
    "modelId": "<base64-encoded project ID>",
    "recordAnnotationUpdates": [
      {
        "recordId": "207c5e526c68c0de",
        "startTime": "2026-03-27T00:00:00.000Z",
        "annotationUpdates": [
          {
            "annotationConfigId": "<our config ID>",
            "annotation": {
              "name": "Judge Failure Annotation",
              "label": "llm_error",
              "annotationType": "Label"
            }
          }
        ],
        "note": { "text": "..." }
      }
    ]
  }
}

Would appreciate any help I can get. Thanks! 😄

·
·

Hi everyone, hope you’re all doing well 👋 I have a question about Langflow’s execution behavior and how it generates traces in Arize. From what I’ve observed, every time a flow runs, it produces two traces: • An AgentExecutor trace (OpenAI agent in my case) • A UUID trace (Langflow component trace) At first this seemed fine, but when I started looking at latency metrics (p50 & p99), I noticed that the AgentExecutor trace is also included in the calculations. My expectation was to track only the UUID traces, since they better represent the latency of each request to the Langflow service. A couple of questions: • Is it expected that AgentExecutor and UUID traces are mixed in latency calculations? • How can I track latency using only UUID traces? Thanks in advance — really appreciate any clarification!

Loading