We've been using non-streaming llm calls up until now and haven't seen the following issue we are now seeing with new streaming calls we've added we have a nested tracing structure (start_as_current_span calls inside other start_as_current_span calls) with one root parent span that wraps all code we are seeing the streaming llm call's LLM spans get orphaned and not connect to the nested traces and it's span (non streaming llm spans continue to get correctly associated in nested trace spans) Are there any best practices to follow when trying to log streaming llm spans correctly?
馃挕聽Hint: Mention RunLLM in the thread for followups.
馃挕聽Hint: Mention RunLLM in the thread for followups.
RunLLM what if i'm using Litellm.completion and OpenAiInstrumentor (auto instrumentation) this the crux of my code
res: Any = completion(
api_key=self.config.api_key,
api_base=self.config.api_base,
api_version=self.config.api_version,
model=engine,
messages=messages, # type: ignore
temperature=temperature,
max_tokens=max_tokens,
top_p=top_p,
frequency_penalty=self.config.frequency_penalty,
presence_penalty=self.config.presence_penalty,
stop=stop,
stream=stream,
seed=seed,
response_format=response_format,
extra_body=extra_body,
**tools_kwargs,
)
if stream:
role: Any = None
for stream_res in res:
if not stream_res.choices:
continue
delta = stream_res.choices[0].delta
if delta is None:
continue
role = delta.role if delta.role is not None else role
content = delta.content if delta.content is not None else ""
if content is None:
continue
yield format_chat_message(role, content)the actual llm span creation i'm assuming is done by the above packages (specifically at the end of .completion), so don't have direct control Should i just manage the context before i make the completion call ?
馃挕聽Hint: Mention RunLLM in the thread for followups.
馃挕聽Hint: Mention RunLLM in the thread for followups.
Hi Swaraj R.! When streaming there's a number of situations that might cause your span hierarchy to change relative to the non-streaming case. When streaming an LLM call, the final span isn't submitted until the stream is fully consumed, which might cause the nesting to show up differently depending on when you're looking at all the traces. Can you check to see if waiting until all the streams have finished and refreshing shows your spans the way you expect?
Can you check to see if waiting until all the streams have finished and refreshing shows your spans the way you expect?
by waiting you mean check phoenix after my workflow is done running? if so, i did check afterwards and there are 3 top level traces
the main one, with all non-streaming llm spans and all other spans, properly arranged
and 3. are the orphaned streaming llm spans
are all the streaming llm spans orphaned or only some of them?
all streaming are orphaned all non-streaming are not
how is your perform_completion method called?
can you try getting the context outside the scope of the function making the streaming completion request?
