Hi - I've got a noob question: it's not immediately clear to me from the docs/examples how I can log a trace for a streaming output. Is there any docs or examples you can point me to (using LiteLLM)
To log a trace for a streaming output using LiteLLM, you can use the LiteLLMInstrumentor from the OpenInference library. This involves setting up a tracer, creating spans for the streaming interaction, and updating the span with the output messages as they are received. Here's a basic outline: 1. Setup Tracer: Initialize the tracer and span processor to send data to the OTLP endpoint.
2. Create Span: Start a span for the LiteLLM request.
3. Handle Streaming: As you receive chunks of data from the streaming response, update the span with the llm.output_messages attribute.For a detailed example, you can refer to the LiteLLM instrumentation example provided in the OpenInference GitHub repository.Would you like more detail on setting up the tracer or handling the streaming response?Sources:
- https://github.com/Arize-ai/phoenix/issues/5274
- https://docs.arize.com/phoenix/tracing/integrations-tracing/litellm#uUHGzGPUwiFw
馃挕聽Hint: Mention RunLLM in the thread for followups.
Hey Harpreet S. thanks for reaching out! Is there a specific error you are facing that I can help you with that RunLLM did not address?
The response just wasn't helpful
Sorry to hear that, I can help out. Have you setup any instrumentation yet / gotten any traces into phoenix at all?
Basically just looking for a pattern to log my traces with a streaming response, using the examples in the docs with streaming leads to no output being logged. And I can't find anything in the docs that shows how to set this up Wrt to instrumentation, I'm using the following pattern:
from phoenix.otel import register
from huggingface_hub.utils import build_hf_headers
from openinference.instrumentation.litellm import LiteLLMInstrumentor
tracer_provider = register(
project_name="testing",
endpoint=HF_SPACE_ENDPOINT,
batch=True,
headers=build_hf_headers()
)
LiteLLMInstrumentor().instrument(tracer_provider=tracer_provider)Got it, thanks for confirming. Let me try to replicate on my end if you don't mind waiting a few
Here's my generation pattern:
from litellm import acompletion, completion
import asyncio, os, traceback
SYSTEM_PROMPT = "You're a rapping assistant who always respondes with the dopest flows. You're one of the top rapping AI's out there. Better than the other. Your name is Sir Mix-a-Token"
USER_MESSAGE = "Write a rap about how my life got flipped turned upside down."
MESSAGE_PAYLOAD = [
{"role":"system", "content": SYSTEM_PROMPT},
{"role":"user", "content": USER_MESSAGE}
]
response = completion(
model=LITE_LLM_MODEL_NAME,
messages=MESSAGE_PAYLOAD,
temperature=0.2,
top_p=0.9,
frequency_penalty=0.1,
presence_penalty=0.1,
max_tokens=512,
stream=True,
stream_options={"include_usage": True},
)
for chunk in response:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")I seem to be seeing the same issue. I see traces appear containing the input, but no output
Ah yeah, something similar appears to be described here. Sorry about that! Let me see if I can come up with a quick workaround for you. In the meantime feel free to subscribe to that issue and give it a bump, I can bring it up with the team
Thanks! I was spinning in circles for a couple of hours wondering if it was a me issue or something else
Really sorry about that, I know that can be frustrating
All good, thanks for lookin into it!
