Hi - I've got a noob question: it's not immediately clear to me from the docs/examples how I can log a trace for a streaming output. Is there any docs or examples you can point me to (using LiteLLM)
To log a trace for a streaming output using LiteLLM, you can use the LiteLLMInstrumentor from the OpenInference library. This involves setting up a tracer, creating spans for the streaming interaction, and updating the span with the output messages as they are received. Here's a basic outline:
1. Setup Tracer: Initialize the tracer and span processor to send data to the OTLP endpoint.
2. Create Span: Start a span for the LiteLLM request.
3. Handle Streaming: As you receive chunks of data from the streaming response, update the span with the llm.output_messages attribute.
For a detailed example, you can refer to the LiteLLM instrumentation example provided in the OpenInference GitHub repository.
Would you like more detail on setting up the tracer or handling the streaming response?
Sources: - https://github.com/Arize-ai/phoenix/issues/5274 - https://docs.arize.com/phoenix/tracing/integrations-tracing/litellm#uUHGzGPUwiFw
馃挕聽Hint: Mention RunLLM in the thread for followups.
Hey Harpreet S. thanks for reaching out! Is there a specific error you are facing that I can help you with that RunLLM did not address?
The response just wasn't helpful
Sorry to hear that, I can help out. Have you setup any instrumentation yet / gotten any traces into phoenix at all?
Basically just looking for a pattern to log my traces with a streaming response, using the examples in the docs with streaming leads to no output being logged. And I can't find anything in the docs that shows how to set this up Wrt to instrumentation, I'm using the following pattern:
from phoenix.otel import register
from huggingface_hub.utils import build_hf_headers
from openinference.instrumentation.litellm import LiteLLMInstrumentor
tracer_provider = register(
project_name="testing",
endpoint=HF_SPACE_ENDPOINT,
batch=True,
headers=build_hf_headers()
)
LiteLLMInstrumentor().instrument(tracer_provider=tracer_provider)Got it, thanks for confirming. Let me try to replicate on my end if you don't mind waiting a few
Here's my generation pattern:
from litellm import acompletion, completion
import asyncio, os, traceback
SYSTEM_PROMPT = "You're a rapping assistant who always respondes with the dopest flows. You're one of the top rapping AI's out there. Better than the other. Your name is Sir Mix-a-Token"
USER_MESSAGE = "Write a rap about how my life got flipped turned upside down."
MESSAGE_PAYLOAD = [
{"role":"system", "content": SYSTEM_PROMPT},
{"role":"user", "content": USER_MESSAGE}
]
response = completion(
model=LITE_LLM_MODEL_NAME,
messages=MESSAGE_PAYLOAD,
temperature=0.2,
top_p=0.9,
frequency_penalty=0.1,
presence_penalty=0.1,
max_tokens=512,
stream=True,
stream_options={"include_usage": True},
)
for chunk in response:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")I seem to be seeing the same issue. I see traces appear containing the input, but no output
Ah yeah, something similar appears to be described here. Sorry about that! Let me see if I can come up with a quick workaround for you. In the meantime feel free to subscribe to that issue and give it a bump, I can bring it up with the team
Thanks! I was spinning in circles for a couple of hours wondering if it was a me issue or something else
Really sorry about that, I know that can be frustrating
All good, thanks for lookin into it!
