We've been using non-streaming llm calls up until now and haven't seen the following issue we are now seeing with new streaming calls we've added we have a nested tracing structure (start_as_current_span calls inside other start_as_current_span calls) with one root parent span that wraps all code we are seeing the streaming llm call's LLM spans get orphaned and not connect to the nested traces and it's span (non streaming llm spans continue to get correctly associated in nested trace spans) Are there any best practices to follow when trying to log streaming llm spans correctly?
It seems likely that the issue with orphaned streaming LLM spans is related to the context propagation in asynchronous calls. My understanding is that when using streaming, the context might not be automatically propagated, causing spans to become orphaned. According to this Slack discussion, you may need to manually set the span context in asynchronous calls to ensure they are correctly associated with the parent span.Would you like more detail on how to implement this? Sources: - Slack Discussion
馃挕聽Hint: Mention RunLLM in the thread for followups.
To address the issue of orphaned streaming LLM spans within the same process, you need to ensure that the context is correctly propagated during asynchronous operations. This is crucial because asynchronous calls can lose the context if not explicitly managed. Here's a step-by-step approach based on the Slack discussion and manual instrumentation guide: 1. Start a Root Span: Begin by creating a root span using start_as_current_span. This will serve as the parent span for all subsequent operations.2. Propagate Context in Async Calls: When dealing with asynchronous operations, especially streaming, you need to manually propagate the context. This can be done by setting the span in the context before making the async call and ensuring the context is used within the async function.3. Use Context Managers: Utilize context managers to maintain the span context across asynchronous boundaries. This ensures that the span remains active and associated with the correct parent span.4. End Spans Appropriately: Ensure that spans are ended correctly after the operation completes. This includes handling exceptions and ensuring the span is ended even if an error occurs.Here's a code snippet to illustrate this: LBKSfrom opentelemetry import trace from opentelemetry.context import attach, detach, get_currentLBKStracer = trace.get_tracer(__name__)LBKSasync def streaming_operation(): # Start a new span for the streaming operation with tracer.start_as_current_span("streaming_operation") as span: # Propagate the context manually token = attach(get_current()) try: # Perform the streaming operation async for data in stream_data(): # Process data pass finally: # Detach the context to ensure proper cleanup detach(token) span.end()LBKSasync def main_workflow(): # Start the root span with tracer.start_as_current_span("root_workflow") as root_span: # Call the streaming operation await streaming_operation()LBKS# Run the main workflow await main_workflow()LBKSThis approach ensures that the context is correctly propagated and maintained across asynchronous calls, preventing orphaned spans.Sources:
- Slack Discussion
- manual instrumentation guide
馃挕聽Hint: Mention RunLLM in the thread for followups.
RunLLM what if i'm using Litellm.completion and OpenAiInstrumentor (auto instrumentation) this the crux of my code
res: Any = completion(
api_key=self.config.api_key,
api_base=self.config.api_base,
api_version=self.config.api_version,
model=engine,
messages=messages, # type: ignore
temperature=temperature,
max_tokens=max_tokens,
top_p=top_p,
frequency_penalty=self.config.frequency_penalty,
presence_penalty=self.config.presence_penalty,
stop=stop,
stream=stream,
seed=seed,
response_format=response_format,
extra_body=extra_body,
**tools_kwargs,
)
if stream:
role: Any = None
for stream_res in res:
if not stream_res.choices:
continue
delta = stream_res.choices[0].delta
if delta is None:
continue
role = delta.role if delta.role is not None else role
content = delta.content if delta.content is not None else ""
if content is None:
continue
yield format_chat_message(role, content)the actual llm span creation i'm assuming is done by the above packages (specifically at the end of .completion), so don't have direct control Should i just manage the context before i make the completion call ?
When using Litellm.completion with OpenAiInstrumentor for auto-instrumentation, the span creation is indeed handled by the instrumentation. However, to ensure that the context is correctly propagated, especially for streaming responses, you should manage the context before making the completion call. This involves setting the context to ensure that the spans are correctly nested under the parent span.Here's how you can manage the context: 1. Start a Parent Span: Before making the completion call, start a parent span using start_as_current_span.2. Ensure Context Propagation: Use the context management utilities to ensure that the context is correctly propagated during the streaming operation.3. End the Parent Span: After the streaming operation is complete, ensure that the parent span is ended.Here's a code snippet to illustrate this: LBKSfrom opentelemetry import trace from opentelemetry.context import attach, detach, get_currentLBKStracer = trace.get_tracer(__name__)LBKSasync def perform_completion(): # Start a parent span with tracer.start_as_current_span("parent_span") as parent_span: # Attach the current context token = attach(get_current()) try: # Perform the completion call res = completion( api_key=self.config.api_key, api_base=self.config.api_base, api_version=self.config.api_version, model=engine, messages=messages, temperature=temperature, max_tokens=max_tokens, top_p=top_p, frequency_penalty=self.config.frequency_penalty, presence_penalty=self.config.presence_penalty, stop=stop, stream=stream, seed=seed, response_format=response_format, extra_body=extra_body, **tools_kwargs, ) if stream: role = None for stream_res in res: if not stream_res.choices: continue delta = stream_res.choices[0].delta if delta is None: continue role = delta.role if delta.role is not None else role content = delta.content if delta.content is not None else "" if content is None: continue yield format_chat_message(role, content) finally: # Detach the context to ensure proper cleanup detach(token) parent_span.end()LBKS# Run the completion await perform_completion()LBKSThis approach ensures that the context is correctly managed and propagated, preventing orphaned spans in the trace.Sources:
- Slack Discussion
馃挕聽Hint: Mention RunLLM in the thread for followups.
When dealing with streaming LLM calls and auto-instrumentation, ensuring proper context propagation can be tricky. If the parent span and the LLM streaming span are still orphaned, it might be due to the asynchronous nature of the streaming operation. Here are some additional steps you can take: 1. Ensure Context is Active: Before making the completion call, ensure that the context is active and correctly set. This can be done by using trace.use_span to explicitly set the span as the current span.2. Check Instrumentation Configuration: Verify that the OpenAiInstrumentor and Litellm are correctly configured to propagate context. Sometimes, configuration issues can lead to context not being propagated as expected.3. Debug Context Propagation: Use logging or debugging tools to check the context at various points in your code to ensure it is being propagated correctly.Here's an updated approach: LBKSfrom opentelemetry import trace from opentelemetry.context import attach, detach, get_current, set_span_in_contextLBKStracer = trace.get_tracer(__name__)LBKSasync def perform_completion(): # Start a parent span with tracer.start_as_current_span("parent_span") as parent_span: # Set the span in the current context context = set_span_in_context(parent_span) token = attach(context) try: # Perform the completion call res = completion( api_key=self.config.api_key, api_base=self.config.api_base, api_version=self.config.api_version, model=engine, messages=messages, temperature=temperature, max_tokens=max_tokens, top_p=top_p, frequency_penalty=self.config.frequency_penalty, presence_penalty=self.config.presence_penalty, stop=stop, stream=stream, seed=seed, response_format=response_format, extra_body=extra_body, **tools_kwargs, ) if stream: role = None for stream_res in res: if not stream_res.choices: continue delta = stream_res.choices[0].delta if delta is None: continue role = delta.role if delta.role is not None else role content = delta.content if delta.content is not None else "" if content is None: continue yield format_chat_message(role, content) finally: # Detach the context to ensure proper cleanup detach(token) parent_span.end()LBKS# Run the completion await perform_completion()LBKS
This approach explicitly sets the span in the current context, which should help in maintaining the correct parent-child relationship in the trace.Sources: - Slack Discussion
馃挕聽Hint: Mention RunLLM in the thread for followups.
Hi Swaraj R.! When streaming there's a number of situations that might cause your span hierarchy to change relative to the non-streaming case. When streaming an LLM call, the final span isn't submitted until the stream is fully consumed, which might cause the nesting to show up differently depending on when you're looking at all the traces. Can you check to see if waiting until all the streams have finished and refreshing shows your spans the way you expect?
Can you check to see if waiting until all the streams have finished and refreshing shows your spans the way you expect?
by waiting you mean check phoenix after my workflow is done running? if so, i did check afterwards and there are 3 top level traces
the main one, with all non-streaming llm spans and all other spans, properly arranged
and 3. are the orphaned streaming llm spans
are all the streaming llm spans orphaned or only some of them?
all streaming are orphaned all non-streaming are not
how is your perform_completion method called?
can you try getting the context outside the scope of the function making the streaming completion request?
