Hello Phoenix Community, RunLLM I'm currently working on an application that uses:
Phoenix for tracing
AutoGen for multi-agent conversations
OpenAI for Large Language Model (LLM) calls
I have set up tracing in my `Chatbot` class using `src.chatbot.pieces.tracing_piece.py` as follows:
# tracing_piece.py
# ...
from openinference.instrumentation.autogen_agentchat import AutogenAgentChatInstrumentor
from openinference.instrumentation.openai import OpenAIInstrumentor
# ...
class PhoenixTracing:
# ...
def initialize(self):
# ...
OpenAIInstrumentor().instrument(tracer_provider=self.tracer_provider)
AutogenAgentChatInstrumentor().instrument(tracer_provider=self.tracer_provider)
# ...And `tracing_piece.initialize()` is called within the `Chatbot` class constructor. According to the Phoenix cost tracking documentation, auto-instrumentation with OpenAI and other supported providers should automatically capture token counts and model information using OpenInference semantic conventions. Specifically, it states that llm.token_count.prompt, llm.token_count.completion, and llm.token_count.total attributes are required on each LLM span for cost aggregation. Despite having both OpenAIInstrumentor and AutogenAgentChatInstrumentor enabled, I am not seeing cumulative token usage or cost data reflected in my Phoenix traces. Individual LLM spans do not appear to have the expected `llm.token_count.*` attributes attached. Could anyone provide guidance on why token usage might not be getting traced and aggregated with this setup? Any help would be greatly appreciated! Thank you.
If you are not seeing token usage or cost data in Phoenix traces with both OpenAIInstrumentor and AutogenAgentChatInstrumentor enabled, the most likely cause is that the OpenAI LLM calls are not being properly instrumented, or the token count attributes are not being attached to the correct spans. Phoenix cost tracking requires that each LLM span includes llm.token_count.prompt, llm.token_count.completion, llm.token_count.total, llm.model_name, and llm.provider attributes, which are automatically captured only if the OpenAI instrumentation is correctly applied to the actual OpenAI client calls in your application. If these attributes are missing, cost aggregation will not work as expected. This is confirmed in the cost tracking documentation and OpenInference OpenAI Instrumentation README.A common issue is that the OpenAI client may be imported or instantiated before the OpenAIInstrumentor().instrument() call, which prevents the instrumentor from monkey-patching the client and capturing token usage. Ensure that the OpenAI instrumentor is initialized before any OpenAI client is created or used. Also, verify that your OpenAI SDK version is at least 1.26 and that, if you are streaming completions, you set stream_options={"include_usage": True} to get token counts (FAQ). If you are using AutoGen, make sure it is actually calling the OpenAI Python SDK and not another backend, as only supported providers will be auto-instrumented for token tracking. If you still do not see token counts, try running a minimal script with only OpenAI instrumentation to confirm that token usage is captured, then incrementally add back AutoGen and other components.Would you like a step-by-step checklist or a minimal working example to help debug this further?Sources:
- Phoenix Cost Tracking Documentation
- OpenInference OpenAI Instrumentation README
- Phoenix Tracing FAQ
💡 Hint: Mention RunLLM in the thread for followups.
RunLLM I am able to see the models_usage field (containing prompt_tokens and completion_tokens) directly within the individual messages array that forms part of the SelectorGroupChat.run_stream span. An example of this is:
{
"messages": [
{
"id": "e07eeae7-9fe4-4c05-adac-5b318017baa2",
"source": "user",
"models_usage": null,
"metadata": {},
"created_at": "2025-09-05T08:48:07.043629+00:00",
"content": "User. Question",
"type": "TextMessage"
},
{
"id": "8770aaf0-0437-41f8-82ff-8cf2643d1447",
"source": "PlanningAgent",
"models_usage": {
"prompt_tokens": 275,
"completion_tokens": 68
},
"metadata": {},
// ...
}
],
"stop_reason": "Text 'TERMINATE' mentioned"
}Having models_usage (with prompt_tokens and completion_tokens) inside your messages array is not sufficient for Phoenix cost tracking—Phoenix requires these token counts to be present as top-level span attributes: llm.token_count.prompt, llm.token_count.completion, and llm.token_count.total (source). If the token usage is only nested within message payloads and not promoted to span attributes, Phoenix will not aggregate or display cost data.To resolve this, ensure your instrumentation extracts models_usage from each message and sets the corresponding span attributes. This may require a custom callback or patch to your tracing logic to map models_usage fields to the required OpenInference attributes on the relevant LLM spans (source).Would you like a code example for extracting and setting these attributes?Sources:
- Phoenix Cost Tracking Documentation
- Phoenix GitHub Issue #8488
💡 Hint: Mention RunLLM in the thread for followups.
yes please, give an overall detailed answer with step by step solution RunLLM
Certainly! Here’s a step-by-step solution to ensure Phoenix cost tracking works by extracting token usage from your message payloads and setting them as span attributes.---Step 1: Understand the RequirementPhoenix cost tracking requires that each LLM span has the following attributes set at the span level (not just inside message payloads): - llm.token_count.prompt
- llm.token_count.completion
- llm.token_count.total
- llm.model_name
- llm.providerPhoenix will not aggregate or display cost data if these are missing, even if token counts are present in nested fields like models_usage (source, GitHub Issue #8488).---Step 2: Extract Token Usage from Message PayloadsSuppose you have a message structure like this: LBKSmessage = { "models_usage": { "prompt_tokens": 275, "completion_tokens": 68 }, # ... other fields ... }LBKSYou need to extract these values and compute the total: LBKSprompt_tokens = message["models_usage"]["prompt_tokens"] completion_tokens = message["models_usage"]["completion_tokens"] total_tokens = prompt_tokens + completion_tokensLBKS---Step 3: Set Span Attributes ManuallyIf you are using OpenTelemetry directly or customizing spans, set these attributes on the current span: LBKSfrom opentelemetry import traceLBKSspan = trace.get_current_span() span.set_attribute("llm.token_count.prompt", prompt_tokens) span.set_attribute("llm.token_count.completion", completion_tokens) span.set_attribute("llm.token_count.total", total_tokens) span.set_attribute("llm.model_name", model_name) # Set this to your model's name span.set_attribute("llm.provider", provider_name) # Set this to your provider, e.g., "openai"LBKS---Step 4: Integrate with Your Tracing LogicIf you’re using a callback or custom handler (e.g., in LangChain, AutoGen, or your own framework), add this logic where you process LLM responses. For example, in a callback’s on_llm_end method: LBKSdef on_llm_end(self, response, **kwargs): # Extract token usage from response/messages models_usage = response.get("models_usage") if models_usage: prompt_tokens = models_usage.get("prompt_tokens", 0) completion_tokens = models_usage.get("completion_tokens", 0) total_tokens = prompt_tokens + completion_tokensLBKS span = trace.get_current_span() span.set_attribute("llm.token_count.prompt", prompt_tokens) span.set_attribute("llm.token_count.completion", completion_tokens) span.set_attribute("llm.token_count.total", total_tokens) span.set_attribute("llm.model_name", "gpt-3.5-turbo") # Example span.set_attribute("llm.provider", "openai") # ExampleLBKS
If you have multiple messages, repeat this for each relevant LLM-generated message.---Step 5: Verify in PhoenixAfter running your application, check the Phoenix UI. The LLM spans should now display token usage and cost data. If not, inspect the raw span attributes to ensure the fields are present at the top level.---Step 6: Automate for All LLM Calls (Optional)References and Examples- Phoenix Cost Tracking Documentation
- Phoenix GitHub Issue #8488
- Manual Instrumentation Example---Example Code BlockLBKSfrom opentelemetry import traceLBKSdef set_llm_token_attributes(message, model_name, provider_name): models_usage = message.get("models_usage") if models_usage: prompt_tokens = models_usage.get("prompt_tokens", 0) completion_tokens = models_usage.get("completion_tokens", 0) total_tokens = prompt_tokens + completion_tokensLBKS span = trace.get_current_span() span.set_attribute("llm.token_count.prompt", prompt_tokens) span.set_attribute("llm.token_count.completion", completion_tokens) span.set_attribute("llm.token_count.total", total_tokens) span.set_attribute("llm.model_name", model_name) span.set_attribute("llm.provider", provider_name)LBKSCall this function after each LLM call, passing the message and model/provider info.---Let me know if you need a more specific example for your framework or want help integrating this into your codebase!Sources:
- Phoenix Cost Tracking Documentation
- Phoenix GitHub Issue #8488
- Manual Instrumentation Example
💡 Hint: Mention RunLLM in the thread for followups.
Hey Swapnil J., can you share which model you are using?
Xander S. I am using Open AI's gpt-4.1
RunLLM Xander S. I checked the issue, when i use FilteringSpanProcessor and filter out 'autogen create', ' autogen send', 'autogen process', 'autogen publish', 'autogen ack', 'create_agent', 'autogen consume', 'autogen receive' etc. I get a filtered span, and there the tokens are not visible. But when I don't Filter, calculated tokens are visible on the phoenix dashboard.
When you use a custom FilteringSpanProcessor to filter out spans like 'autogen create', 'autogen send', etc., you are likely removing the LLM spans where token counts are attached. Phoenix cost tracking relies on these LLM spans to aggregate and display token usage; if they are filtered out, token data will not appear in the dashboard. This is expected behavior, as documented in the Phoenix span filtering guide: filtering out spans removes their data from export and downstream analytics.If you want token counts to be visible, avoid filtering out LLM-related spans or ensure that the spans you keep include the required token count attributes.Would you like guidance on how to filter only non-LLM spans while preserving token tracking?Sources: - Phoenix span filtering guide
💡 Hint: Mention RunLLM in the thread for followups.
RunLLM can you tell me which filter's should I remove??
"""
The `tracing_piece.py` module provides tracing capabilities for the chatbot application.
This module uses:
1. **OpenTelemetry**: Core tracing infrastructure.
2. **OpenInference**: Extends OpenTelemetry with AI/ML specific conventions.
3. **Arize Phoenix**: Open-source observability platform for AI applications.
The tracing uses default auto-instrumentation for OpenAI calls with custom filtering.
Key classes and methods:
- `FilteringSpanProcessor`: A custom span processor that filters out unwanted AutoGen internal spans to reduce noise in the tracing data.
- `PhoenixTracing`: The main tracing class that configures and manages the entire tracing system.
- `__init__(self)`: Initializes the tracing instance.
- `initialize(self)`: Configures and starts the Phoenix tracing system. It requires `PHOENIX_COLLECTOR_ENDPOINT`, `PHOENIX_PROJECT_NAME` and `PHOENIX_API_KEY` environment variables.
- It registers with Phoenix OTEL, adds custom filtering, instruments OpenAI, and sets up a global tracer.
"""
import os
from openinference.instrumentation.autogen_agentchat import AutogenAgentChatInstrumentor
from openinference.instrumentation.openai import OpenAIInstrumentor
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from phoenix.otel import register
class FilteringSpanProcessor(BatchSpanProcessor):
"""Custom span processor that filters out unwanted AutoGen spans."""
def _filter_condition(self, span) -> bool:
"""Check if span should be filtered out."""
# Filter out verbose AutoGen internal spans that don't add value
unwanted_prefixes = [
'autogen create',
'autogen send',
'autogen process',
'autogen publish',
'autogen ack',
'create_agent',
'autogen consume',
'autogen receive',
]
# Also filter out very short-lived spans that are noise
unwanted_exact_names = ['_process_message', '_handle_response', '_format_message']
span_name = getattr(span, 'name', '')
return any(span_name.startswith(prefix) for prefix in unwanted_prefixes) or span_name in unwanted_exact_names
def on_start(self, span, parent_context=None) -> None:
"""Override on_start to filter spans."""
if self._filter_condition(span):
return # Skip this span
super().on_start(span, parent_context)
def on_end(self, span) -> None:
"""Override on_end to filter spans."""
if self._filter_condition(span):
return # Skip this span
super().on_end(span)
class PhoenixTracing:
"""Phoenix tracing configuration with auto-instrumentation for AutoGen agents."""
def __init__(self):
self.tracer_provider: TracerProvider | None = None
self.tracer: trace.Tracer | None = None
self._is_initialized = False
def initialize(self):
"""Initialize Phoenix tracing with auto-instrumentation for AutoGen agents."""
if self._is_initialized:
return
# Get environment variables
phoenix_collector_endpoint = os.getenv('PHOENIX_COLLECTOR_ENDPOINT')
phoenix_project_name = os.getenv('PHOENIX_PROJECT_NAME', 'mutb-manual-poc4')
phoenix_api_key = os.getenv('PHOENIX_API_KEY')
if not phoenix_collector_endpoint:
return
try:
# Register with Phoenix OTEL with auto-instrumentation
self.tracer_provider = register(
project_name=phoenix_project_name, endpoint=phoenix_collector_endpoint, auto_instrument=True
)
# Add custom filtering processor
if phoenix_api_key:
# Use API key for authentication
self.tracer_provider.add_span_processor(
FilteringSpanProcessor(
OTLPSpanExporter(
endpoint=phoenix_collector_endpoint, headers={'Authorization': f'Bearer {phoenix_api_key}'}
)
)
)
else:
# Use without authentication
self.tracer_provider.add_span_processor(
FilteringSpanProcessor(OTLPSpanExporter(endpoint=phoenix_collector_endpoint))
)
# # Instrument OpenAI for chat completion tracing
# OpenAIInstrumentor().instrument(tracer_provider=self.tracer_provider)
# Instrument AutoGen for chat completion tracing
AutogenAgentChatInstrumentor().instrument(tracer_provider=self.tracer_provider)
# Get the global tracer
self.tracer = trace.get_tracer(__name__)
self._is_initialized = True
except Exception:
self._is_initialized = False
# Global instance
tracing_piece = PhoenixTracing()You should remove any filters that exclude LLM spans, as these are the spans where token usage and cost attributes are attached. In your FilteringSpanProcessor, the current filter excludes spans whose names start with: - 'autogen create'
- 'autogen send'
- 'autogen process'
- 'autogen publish'
- 'autogen ack'
- 'create_agent'
- 'autogen consume'
- 'autogen receive'and spans with exact names: - '_process_message'
- '_handle_response'
- '_format_message'However, if any of your LLM (OpenAI) calls or their token-counting logic are instrumented under these names, filtering them out will remove the token usage from Phoenix. To ensure token tracking works, do NOT filter out spans that represent LLM calls (often named like 'openai.chat.completions', 'openai.completions', or similar). You can safely filter out only spans that are confirmed to be internal, non-LLM, and do not contain token usage attributes.Recommended change:
- Remove any prefix or exact name from unwanted_prefixes and unwanted_exact_names if it matches the span name of your LLM calls or their direct parents.
- If unsure, temporarily disable the filter and inspect span names in Phoenix to identify which ones contain token usage, then adjust your filter to keep those.For more, see the Phoenix span filtering guide.Would you like help identifying the exact span names to keep for OpenAI LLM calls?Sources:
- Phoenix span filtering guide
💡 Hint: Mention RunLLM in the thread for followups.
RunLLM now i limited filtering to these only.
unwanted_prefixes = [
# 'autogen create',
# 'autogen send',
# 'autogen process',
# 'autogen publish',
'autogen ack',
'create_agent',
# 'autogen consume',
# 'autogen receive',
]I can see the token but the cost still remains zero
If you now see token counts but cost remains zero, it means Phoenix is receiving token usage but cannot match your span’s model name and provider to a pricing entry. For cost to be calculated, the span must have the following attributes: llm.token_count.prompt, llm.token_count.completion, llm.token_count.total, llm.model_name, and llm.provider, and the model/provider combination must match an entry in Phoenix’s model pricing table (source).Check your span attributes in Phoenix and confirm:
- llm.model_name matches a known model (e.g., gpt-4.1 or gpt-4).
- llm.provider is set (e.g., openai).
- If using a custom or new model, add or update its pricing in Phoenix under Settings → Models, ensuring the name pattern and provider match the span attributes (source).Would you like step-by-step instructions for adding a custom model price entry?Sources:
- Phoenix Cost Tracking Documentation
- Phoenix GitHub Issue #8465
💡 Hint: Mention RunLLM in the thread for followups.
RunLLM I could not solve it, please explain me in detail
