How to Log Traces for Streaming Output in LiteLLM

·Jan 11, 2025 08:35 PM

Hi - I've got a noob question: it's not immediately clear to me from the docs/examples how I can log a trace for a streaming output. Is there any docs or examples you can point me to (using LiteLLM)

13 comments

· Sorted by Oldest

RunLLM
APP
·
To log a trace for a streaming output using LiteLLM, you can use the LiteLLMInstrumentor from the OpenInference library. This involves setting up a tracer, creating spans for the streaming interaction, and updating the span with the output messages as they are received. Here's a basic outline:
1. Setup Tracer: Initialize the tracer and span processor to send data to the OTLP endpoint. 2. Create Span: Start a span for the LiteLLM request. 3. Handle Streaming: As you receive chunks of data from the streaming response, update the span with the llm.output_messages attribute.
For a detailed example, you can refer to the LiteLLM instrumentation example provided in the OpenInference GitHub repository.
Would you like more detail on setting up the tracer or handling the streaming response?
Sources: - https://github.com/Arize-ai/phoenix/issues/5274 - https://docs.arize.com/phoenix/tracing/integrations-tracing/litellm#uUHGzGPUwiFw
💡 Hint: Mention RunLLM in the thread for followups.
Anthony P.
·
Hey Harpreet S. thanks for reaching out! Is there a specific error you are facing that I can help you with that RunLLM did not address?
Harpreet S.
·
The response just wasn't helpful
Anthony P.
·
Sorry to hear that, I can help out. Have you setup any instrumentation yet / gotten any traces into phoenix at all?

Harpreet S.

Basically just looking for a pattern to log my traces with a streaming response, using the examples in the docs with streaming leads to no output being logged. And I can't find anything in the docs that shows how to set this up Wrt to instrumentation, I'm using the following pattern:

from phoenix.otel import register
from huggingface_hub.utils import build_hf_headers
from openinference.instrumentation.litellm import LiteLLMInstrumentor

tracer_provider = register(
  project_name="testing",
  endpoint=HF_SPACE_ENDPOINT,
  batch=True,
  headers=build_hf_headers()
)

LiteLLMInstrumentor().instrument(tracer_provider=tracer_provider)

Anthony P.
·
Got it, thanks for confirming. Let me try to replicate on my end if you don't mind waiting a few
💯1

Harpreet S.

Here's my generation pattern:

from litellm import acompletion, completion
import asyncio, os, traceback

SYSTEM_PROMPT = "You're a rapping assistant who always respondes with the dopest flows. You're one of the top rapping AI's out there. Better than the other. Your name is Sir Mix-a-Token"

USER_MESSAGE = "Write a rap about how my life got flipped turned upside down."

MESSAGE_PAYLOAD = [
    {"role":"system", "content": SYSTEM_PROMPT},
    {"role":"user", "content": USER_MESSAGE}
    ]

response = completion(
    model=LITE_LLM_MODEL_NAME,
    messages=MESSAGE_PAYLOAD,
    temperature=0.2,
    top_p=0.9,
    frequency_penalty=0.1,
    presence_penalty=0.1,
    max_tokens=512,
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in response:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Anthony P.
·
I seem to be seeing the same issue. I see traces appear containing the input, but no output
Anthony P.
·
Let me check and see if we have an issue filed for this
🙌1
Anthony P.
·
Ah yeah, something similar appears to be described here. Sorry about that! Let me see if I can come up with a quick workaround for you. In the meantime feel free to subscribe to that issue and give it a bump, I can bring it up with the team
Harpreet S.
·
Thanks! I was spinning in circles for a couple of hours wondering if it was a me issue or something else
Anthony P.
·
Really sorry about that, I know that can be frustrating
Harpreet S.
·
All good, thanks for lookin into it!

RunLLM
APP
·
To log a trace for a streaming output using LiteLLM, you can use the LiteLLMInstrumentor from the OpenInference library. This involves setting up a tracer, creating spans for the streaming interaction, and updating the span with the output messages as they are received. Here's a basic outline:
1. Setup Tracer: Initialize the tracer and span processor to send data to the OTLP endpoint. 2. Create Span: Start a span for the LiteLLM request. 3. Handle Streaming: As you receive chunks of data from the streaming response, update the span with the llm.output_messages attribute.
For a detailed example, you can refer to the LiteLLM instrumentation example provided in the OpenInference GitHub repository.
Would you like more detail on setting up the tracer or handling the streaming response?
Sources: - https://github.com/Arize-ai/phoenix/issues/5274 - https://docs.arize.com/phoenix/tracing/integrations-tracing/litellm#uUHGzGPUwiFw
💡 Hint: Mention RunLLM in the thread for followups.
Anthony P.
·
Hey Harpreet S. thanks for reaching out! Is there a specific error you are facing that I can help you with that RunLLM did not address?
Harpreet S.
·
The response just wasn't helpful
Anthony P.
·
Sorry to hear that, I can help out. Have you setup any instrumentation yet / gotten any traces into phoenix at all?

Harpreet S.

from phoenix.otel import register
from huggingface_hub.utils import build_hf_headers
from openinference.instrumentation.litellm import LiteLLMInstrumentor

tracer_provider = register(
  project_name="testing",
  endpoint=HF_SPACE_ENDPOINT,
  batch=True,
  headers=build_hf_headers()
)

LiteLLMInstrumentor().instrument(tracer_provider=tracer_provider)

Anthony P.
·
Got it, thanks for confirming. Let me try to replicate on my end if you don't mind waiting a few
💯1

Harpreet S.

Here's my generation pattern:

from litellm import acompletion, completion
import asyncio, os, traceback

SYSTEM_PROMPT = "You're a rapping assistant who always respondes with the dopest flows. You're one of the top rapping AI's out there. Better than the other. Your name is Sir Mix-a-Token"

USER_MESSAGE = "Write a rap about how my life got flipped turned upside down."

MESSAGE_PAYLOAD = [
    {"role":"system", "content": SYSTEM_PROMPT},
    {"role":"user", "content": USER_MESSAGE}
    ]

response = completion(
    model=LITE_LLM_MODEL_NAME,
    messages=MESSAGE_PAYLOAD,
    temperature=0.2,
    top_p=0.9,
    frequency_penalty=0.1,
    presence_penalty=0.1,
    max_tokens=512,
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in response:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Anthony P.
·
I seem to be seeing the same issue. I see traces appear containing the input, but no output
Anthony P.
·
Let me check and see if we have an issue filed for this
🙌1
Anthony P.
·
Ah yeah, something similar appears to be described here. Sorry about that! Let me see if I can come up with a quick workaround for you. In the meantime feel free to subscribe to that issue and give it a bump, I can bring it up with the team
Harpreet S.
·
Thanks! I was spinning in circles for a couple of hours wondering if it was a me issue or something else
Anthony P.
·
Really sorry about that, I know that can be frustrating
Harpreet S.
·
All good, thanks for lookin into it!