Troubleshooting Traces for Completion Calls with Litellm

Swaraj R. · 2025-04-01T18:44:39.778Z

@RunLLM I'm having some issue getting traces for my completion calls I am using Litellm to make calls to claude hosted on bedrock What instrumentors do i need to use? I used LitellmInstrumentor and it didn't result in any traces To confirm i used litellm with an openai model and OpenAiInstrumentor and it worked, should i instead be using an AnthropicInstrumentor or a Bedrock Instrumentor? IDeeally i can just use Litellm and it'll support all models that litellm supports

8 comments

· Sorted by Oldest

RunLLM
APP
·
To instrument LiteLLM calls for models hosted on AWS Bedrock, you should use the `BedrockInstrumentor` rather than the `LiteLLMInstrumentor`. The `BedrockInstrumentor` is specifically designed to handle AWS Bedrock models, including those from Anthropic like Claude. According to the <https://docs.arize.com/phoenix/tracing/integrations-tracing/bedrock|Bedrock Integration Guide>, you can instrument your `boto3` client to send traces to Phoenix. Here's a basic setup: 1. *Install the necessary packages*: ``` pip install openinference-instrumentation-bedrock arize-phoenix boto3 ``` 2. *Instrument your application*: ``` import boto3 from openinference.instrumentation.bedrock import BedrockInstrumentor from phoenix.otel import register # Register the Phoenix tracer tracer_provider = register(project_name="my-llm-app") # Instrument the boto3 client BedrockInstrumentor().instrument(tracer_provider=tracer_provider) # Create a boto3 client for Bedrock session = boto3.session.Session() client = session.client("bedrock-runtime") ``` 3. *Make calls using the instrumented client*: ``` response = client.invoke_model(modelId="anthropic.claude-v2", body=prompt) ``` This setup will ensure that your calls to AWS Bedrock models are traced and sent to Phoenix for observability. Would you like more detail on any specific part? Sources: - <https://docs.arize.com/phoenix/tracing/integrations-tracing/bedrock#P%3A1|https://docs.arize.com/phoenix/tracing/integrations-tracing/bedrock#P%3A1> - <https://github.com/Arize-ai/phoenix/blob/main/tutorials/integrations/bedrock_tracing_tutorial.ipynb|https://github.com/Arize-ai/phoenix/blob/main/tutorials/integrations/bedrock_tracing_tutorial.ipynb>
💡 Hint: Mention RunLLM in the thread for followups.
Swaraj R.
·
RunLLM but why can i not use litellm instrumentor? I thought that would be useful since i may switch models i use with litellm and so one instrumentor would apply to all model providers?
RunLLM
APP
·
The `LiteLLMInstrumentor` is designed to work with the LiteLLM library, which allows you to call various LLM APIs using the OpenAI format. However, it may not fully support all the intricacies of AWS Bedrock's specific implementation, especially when it comes to tracing. The `BedrockInstrumentor` is specifically tailored to handle the AWS Bedrock environment, ensuring that all calls to Bedrock models are properly traced and sent to Phoenix. While the `LiteLLMInstrumentor` can be used for models that are directly compatible with the OpenAI API format, it might not capture all the necessary details for Bedrock models due to differences in how these models are accessed and managed. If you are using LiteLLM to switch between different model providers, you might need to use the specific instrumentor for each provider to ensure full compatibility and traceability. Would you like more detail on any specific part? Sources: - <https://github.com/Arize-ai/openinference/issues/1303|https://github.com/Arize-ai/openinference/issues/1303> - <https://github.com/Arize-ai/openinference/issues/1233|https://github.com/Arize-ai/openinference/issues/1233>
💡 Hint: Mention RunLLM in the thread for followups.
Swaraj R.
·
i am calling completion and acompletion with stream True/False but none of the 4 combinations lead to any traces for me when i use claude 3.7 on aws bedrock
Swaraj R.
·
ah makes sense then i'll wait for that change before trying litellm instrumentor, will go with bedrock for now, lmk if you want any examples from my side
Swaraj R.
·
actually this is interesting, seem to have solved the issue for us specifically Some context :
in our production setting we have defined a CachedLitellm class that essentially adds a cache check for any completion or acompletion call to litellm we have kept the names of methods of CachedLitellm as completion and acompletion same as litellm
I tested 2 things
1.
script that calls litellm.completion and acompletion directly
a.
attached as snippet in the next message
b.
This actually brings the traces and spans i need (see first image)
c.
This only uses LitellmInstrumentor
3.
script that calls my cached_litellm.completion and acompletion
a.
not attached this but prety much the same
b.
This doesn't bring in any llm spans (see seecond image)
5.
finally i changed my cached_litellm.acompletion and completion to have handle_ prefix and tested the script again with cached_litellm
a.
Huzzah! the llm spans are now showing up
b.
I guess the instrumentor didn't like what we were doing by overrding the function names in our cached litellm class?
c.
see third image
Curious what was the issue you also replicated, was it the first point (1) above? On a separate not, should i be seeing spans at the final streaming accumulated output or should i be seeing stream events/chunks be seaprate spans?

Swaraj R.

here is the script for all 1-3 only difference in 2 and 3 is instead of litellm.completion i do cached_litellm_obj.completion / cached_litellm_obj.handle_completion

# %%
import asyncio
from hx_platform.tools.local_settings_util import load_settings_to_env
load_settings_to_env()
# %%

import os
import logging
from openinference.instrumentation.litellm import LiteLLMInstrumentor
from openinference.semconv.resource import ResourceAttributes
from opentelemetry import trace as trace_api
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.sdk.resources import Resource    
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from openinference.instrumentation.openai import OpenAIInstrumentor
import litellm 

# LlamaIndexInstrumentor().instrument()
LiteLLMInstrumentor().instrument()
# OpenAIInstrumentor().instrument()

# Use service_name if provided, otherwise fall back to project_name
resource_attributes = {}
resource_attributes[ResourceAttributes.PROJECT_NAME] = 'test4'

resource = Resource.create(resource_attributes)

tracer_provider = trace_sdk.TracerProvider(resource=resource)
multi_span_processor = trace_sdk.ConcurrentMultiSpanProcessor()

if os.environ.get("PHOENIX_COLLECTOR_ENDPOINT"):
    # Add OTLP exporter
    multi_span_processor.add_span_processor(
        SimpleSpanProcessor(
            OTLPSpanExporter("http://localhost:6006/v1/traces")
        )
    )
else:
    logging.info(
        "PHOENIX_COLLECTOR_ENDPOINT not set. Phoenix Tracing will not be enabled."
    )
tracer_provider.add_span_processor(multi_span_processor)
trace_api.set_tracer_provider(tracer_provider)


MODEL_NAME = 'bedrock/anthropic.claude-3-7-sonnet-20250219-v1:0'
MODEL_ID = 'arn:aws:bedrock:<YOUR_AWS_REGION>:<YOUR_AWS_ACCOUNT_ID>:inference-profile/<MODEL_NAME>:<MODEL_VERSION>'
AWS_ACCESS_KEY_ID = os.environ.get("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = os.environ.get("AWS_SECRET_ACCESS_KEY")
AWS_REGION = os.environ.get("AWS_REGION_NAME")
# %%
TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "search_knowledge_base",
            "description": "Search the knowledge base for relevant information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query"
                    },
                    "limit": {
                        "type": "integer",
                        "description": "Maximum number of results to return",
                        "default": 5
                    }
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_incident_details",
            "description": "Get details about a specific incident",
            "parameters": {
                "type": "object",
                "properties": {
                    "incident_id": {
                        "type": "string",
                        "description": "The unique identifier for the incident"
                    }
                },
                "required": ["incident_id"]
            }
        }
    }
]

INPUT_MESSAGES = [
        {"role": "system", "content": "You are an AI assistant helping with incident management. You can search for information and retrieve incident details to help users."},
        {"role": "user", "content": "What are the recent critical incidents in our production environment?"}
    ]

tracer = trace_api.get_tracer(__name__)
# %% 
## No Stream Sync Completion Claude With Thinking + Tools 

with tracer.start_as_current_span("litellm.completion_sync_with_tools_and_thinking") as span:
    out = litellm.completion(
        model=MODEL_NAME,
        model_id=MODEL_ID,
        api_key=None,
        aws_access_key_id=AWS_ACCESS_KEY_ID,
        aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
        aws_region_name=AWS_REGION,
        messages=INPUT_MESSAGES,
        thinking={"type": "enabled", "budget_tokens": 2000},
        temperature=1
    )
    print(out)
# %%
# No Stream Async Completion Claude With Thinking + Tools 
with tracer.start_as_current_span("litellm.acompletion_async_with_tools_and_thinking") as span:
    out2 = asyncio.run(
            litellm.acompletion(
                model=MODEL_NAME,
            model_id=MODEL_ID,
            api_key=None,
            aws_access_key_id=AWS_ACCESS_KEY_ID,
            aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
            aws_region_name=AWS_REGION,
            messages=INPUT_MESSAGES,
            thinking={"type": "enabled", "budget_tokens": 2000},
            temperature=1,
            tools=TOOLS
        )
    )
    print(out2)
# %% 
with tracer.start_as_current_span("litellm.completion_sync_with_tools_and_thinking") as span:
    out3 = litellm.completion(
        model=MODEL_NAME,
        model_id=MODEL_ID,
        api_key=None,
        aws_access_key_id=AWS_ACCESS_KEY_ID,
        aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
        aws_region_name=AWS_REGION,
        messages=INPUT_MESSAGES,
        temperature=1,
        thinking={"type": "enabled", "budget_tokens": 2000},
        stream=True,
        tools=TOOLS
    )
    for chunk in out3:
        print(chunk)
# %% 
async def run():
    with tracer.start_as_current_span("litellm.acompletion_async_with_tools_and_thinking") as span:
        out4 = await litellm.acompletion(
                model=MODEL_NAME,
                model_id=MODEL_ID,
                api_key=None,
                aws_access_key_id=AWS_ACCESS_KEY_ID,
                aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
                aws_region_name=AWS_REGION,
                messages=INPUT_MESSAGES,
                temperature=1,
                thinking={"type": "enabled", "budget_tokens": 2000},
                stream=True,
                tools=TOOLS
            )
        async for chunk in out4:
            print(chunk)

asyncio.run(run())
# %%

Swaraj R.
·
awesome thanks for the quick response!
🙌1

RunLLM
APP
·
To instrument LiteLLM calls for models hosted on AWS Bedrock, you should use the `BedrockInstrumentor` rather than the `LiteLLMInstrumentor`. The `BedrockInstrumentor` is specifically designed to handle AWS Bedrock models, including those from Anthropic like Claude. According to the <https://docs.arize.com/phoenix/tracing/integrations-tracing/bedrock|Bedrock Integration Guide>, you can instrument your `boto3` client to send traces to Phoenix. Here's a basic setup: 1. *Install the necessary packages*: ``` pip install openinference-instrumentation-bedrock arize-phoenix boto3 ``` 2. *Instrument your application*: ``` import boto3 from openinference.instrumentation.bedrock import BedrockInstrumentor from phoenix.otel import register # Register the Phoenix tracer tracer_provider = register(project_name="my-llm-app") # Instrument the boto3 client BedrockInstrumentor().instrument(tracer_provider=tracer_provider) # Create a boto3 client for Bedrock session = boto3.session.Session() client = session.client("bedrock-runtime") ``` 3. *Make calls using the instrumented client*: ``` response = client.invoke_model(modelId="anthropic.claude-v2", body=prompt) ``` This setup will ensure that your calls to AWS Bedrock models are traced and sent to Phoenix for observability. Would you like more detail on any specific part? Sources: - <https://docs.arize.com/phoenix/tracing/integrations-tracing/bedrock#P%3A1|https://docs.arize.com/phoenix/tracing/integrations-tracing/bedrock#P%3A1> - <https://github.com/Arize-ai/phoenix/blob/main/tutorials/integrations/bedrock_tracing_tutorial.ipynb|https://github.com/Arize-ai/phoenix/blob/main/tutorials/integrations/bedrock_tracing_tutorial.ipynb>
💡 Hint: Mention RunLLM in the thread for followups.
Swaraj R.
·
RunLLM but why can i not use litellm instrumentor? I thought that would be useful since i may switch models i use with litellm and so one instrumentor would apply to all model providers?
RunLLM
APP
·
The `LiteLLMInstrumentor` is designed to work with the LiteLLM library, which allows you to call various LLM APIs using the OpenAI format. However, it may not fully support all the intricacies of AWS Bedrock's specific implementation, especially when it comes to tracing. The `BedrockInstrumentor` is specifically tailored to handle the AWS Bedrock environment, ensuring that all calls to Bedrock models are properly traced and sent to Phoenix. While the `LiteLLMInstrumentor` can be used for models that are directly compatible with the OpenAI API format, it might not capture all the necessary details for Bedrock models due to differences in how these models are accessed and managed. If you are using LiteLLM to switch between different model providers, you might need to use the specific instrumentor for each provider to ensure full compatibility and traceability. Would you like more detail on any specific part? Sources: - <https://github.com/Arize-ai/openinference/issues/1303|https://github.com/Arize-ai/openinference/issues/1303> - <https://github.com/Arize-ai/openinference/issues/1233|https://github.com/Arize-ai/openinference/issues/1233>
💡 Hint: Mention RunLLM in the thread for followups.
Swaraj R.
·
i am calling completion and acompletion with stream True/False but none of the 4 combinations lead to any traces for me when i use claude 3.7 on aws bedrock
Swaraj R.
·
ah makes sense then i'll wait for that change before trying litellm instrumentor, will go with bedrock for now, lmk if you want any examples from my side
Swaraj R.
·
actually this is interesting, seem to have solved the issue for us specifically Some context :
in our production setting we have defined a CachedLitellm class that essentially adds a cache check for any completion or acompletion call to litellm we have kept the names of methods of CachedLitellm as completion and acompletion same as litellm
I tested 2 things
1.
script that calls litellm.completion and acompletion directly
a.
attached as snippet in the next message
b.
This actually brings the traces and spans i need (see first image)
c.
This only uses LitellmInstrumentor
3.
script that calls my cached_litellm.completion and acompletion
a.
not attached this but prety much the same
b.
This doesn't bring in any llm spans (see seecond image)
5.
finally i changed my cached_litellm.acompletion and completion to have handle_ prefix and tested the script again with cached_litellm
a.
Huzzah! the llm spans are now showing up
b.
I guess the instrumentor didn't like what we were doing by overrding the function names in our cached litellm class?
c.
see third image
Curious what was the issue you also replicated, was it the first point (1) above? On a separate not, should i be seeing spans at the final streaming accumulated output or should i be seeing stream events/chunks be seaprate spans?

Swaraj R.

here is the script for all 1-3 only difference in 2 and 3 is instead of litellm.completion i do cached_litellm_obj.completion / cached_litellm_obj.handle_completion

# %%
import asyncio
from hx_platform.tools.local_settings_util import load_settings_to_env
load_settings_to_env()
# %%

import os
import logging
from openinference.instrumentation.litellm import LiteLLMInstrumentor
from openinference.semconv.resource import ResourceAttributes
from opentelemetry import trace as trace_api
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.sdk.resources import Resource    
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from openinference.instrumentation.openai import OpenAIInstrumentor
import litellm 

# LlamaIndexInstrumentor().instrument()
LiteLLMInstrumentor().instrument()
# OpenAIInstrumentor().instrument()

# Use service_name if provided, otherwise fall back to project_name
resource_attributes = {}
resource_attributes[ResourceAttributes.PROJECT_NAME] = 'test4'

resource = Resource.create(resource_attributes)

tracer_provider = trace_sdk.TracerProvider(resource=resource)
multi_span_processor = trace_sdk.ConcurrentMultiSpanProcessor()

if os.environ.get("PHOENIX_COLLECTOR_ENDPOINT"):
    # Add OTLP exporter
    multi_span_processor.add_span_processor(
        SimpleSpanProcessor(
            OTLPSpanExporter("http://localhost:6006/v1/traces")
        )
    )
else:
    logging.info(
        "PHOENIX_COLLECTOR_ENDPOINT not set. Phoenix Tracing will not be enabled."
    )
tracer_provider.add_span_processor(multi_span_processor)
trace_api.set_tracer_provider(tracer_provider)


MODEL_NAME = 'bedrock/anthropic.claude-3-7-sonnet-20250219-v1:0'
MODEL_ID = 'arn:aws:bedrock:<YOUR_AWS_REGION>:<YOUR_AWS_ACCOUNT_ID>:inference-profile/<MODEL_NAME>:<MODEL_VERSION>'
AWS_ACCESS_KEY_ID = os.environ.get("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = os.environ.get("AWS_SECRET_ACCESS_KEY")
AWS_REGION = os.environ.get("AWS_REGION_NAME")
# %%
TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "search_knowledge_base",
            "description": "Search the knowledge base for relevant information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query"
                    },
                    "limit": {
                        "type": "integer",
                        "description": "Maximum number of results to return",
                        "default": 5
                    }
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_incident_details",
            "description": "Get details about a specific incident",
            "parameters": {
                "type": "object",
                "properties": {
                    "incident_id": {
                        "type": "string",
                        "description": "The unique identifier for the incident"
                    }
                },
                "required": ["incident_id"]
            }
        }
    }
]

INPUT_MESSAGES = [
        {"role": "system", "content": "You are an AI assistant helping with incident management. You can search for information and retrieve incident details to help users."},
        {"role": "user", "content": "What are the recent critical incidents in our production environment?"}
    ]

tracer = trace_api.get_tracer(__name__)
# %% 
## No Stream Sync Completion Claude With Thinking + Tools 

with tracer.start_as_current_span("litellm.completion_sync_with_tools_and_thinking") as span:
    out = litellm.completion(
        model=MODEL_NAME,
        model_id=MODEL_ID,
        api_key=None,
        aws_access_key_id=AWS_ACCESS_KEY_ID,
        aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
        aws_region_name=AWS_REGION,
        messages=INPUT_MESSAGES,
        thinking={"type": "enabled", "budget_tokens": 2000},
        temperature=1
    )
    print(out)
# %%
# No Stream Async Completion Claude With Thinking + Tools 
with tracer.start_as_current_span("litellm.acompletion_async_with_tools_and_thinking") as span:
    out2 = asyncio.run(
            litellm.acompletion(
                model=MODEL_NAME,
            model_id=MODEL_ID,
            api_key=None,
            aws_access_key_id=AWS_ACCESS_KEY_ID,
            aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
            aws_region_name=AWS_REGION,
            messages=INPUT_MESSAGES,
            thinking={"type": "enabled", "budget_tokens": 2000},
            temperature=1,
            tools=TOOLS
        )
    )
    print(out2)
# %% 
with tracer.start_as_current_span("litellm.completion_sync_with_tools_and_thinking") as span:
    out3 = litellm.completion(
        model=MODEL_NAME,
        model_id=MODEL_ID,
        api_key=None,
        aws_access_key_id=AWS_ACCESS_KEY_ID,
        aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
        aws_region_name=AWS_REGION,
        messages=INPUT_MESSAGES,
        temperature=1,
        thinking={"type": "enabled", "budget_tokens": 2000},
        stream=True,
        tools=TOOLS
    )
    for chunk in out3:
        print(chunk)
# %% 
async def run():
    with tracer.start_as_current_span("litellm.acompletion_async_with_tools_and_thinking") as span:
        out4 = await litellm.acompletion(
                model=MODEL_NAME,
                model_id=MODEL_ID,
                api_key=None,
                aws_access_key_id=AWS_ACCESS_KEY_ID,
                aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
                aws_region_name=AWS_REGION,
                messages=INPUT_MESSAGES,
                temperature=1,
                thinking={"type": "enabled", "budget_tokens": 2000},
                stream=True,
                tools=TOOLS
            )
        async for chunk in out4:
            print(chunk)

asyncio.run(run())
# %%

Swaraj R.
·
awesome thanks for the quick response!
🙌1