Hi Phoenix team, Quick question on manual instrumentation. We are doing REST API calls to OpenAI instead of using openai sdk in Python, so we are thinking of doing the manual instrumentation route and adding the right attributes. Our team had a thought today that we are looking for your guidance on. How, if at all, would us sending information over to Arize via the OTLP endpoint change if we had streaming responses from Azure OpenAI and we are manually setting the llm output messages attribute?
To manually instrument REST API calls to OpenAI and handle streaming responses, you need to ensure that the spans are correctly created and the attributes are set appropriately. The process involves setting up a tracer, creating spans for each part of the interaction, and adding the necessary attributes to these spans.For streaming responses, you will need to handle the streaming data within the span and set the llm.output_messages attribute incrementally as you receive chunks of data. Here鈥檚 a basic outline of how you can achieve this: 1. Setup Tracer: Initialize the tracer and span processor to send data to the OTLP endpoint.
2. Create Span: Start a span for the OpenAI request.
3. Handle Streaming: As you receive chunks of data from the streaming response, update the span with the llm.output_messages attribute.Here is a code snippet to illustrate this: LBKSfrom urllib.parse import urljoin from opentelemetry import trace from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter from opentelemetry.sdk.resources import Resource from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import SimpleSpanProcessor import httpxLBKS# Setup tracer resource = Resource(attributes={"service.name": "your-service-name"}) tracer_provider = TracerProvider(resource=resource) trace.set_tracer_provider(tracer_provider)LBKScollector_endpoint = urljoin("<http://your-collector-host:your-collector-port>", "v1/traces") span_exporter = OTLPSpanExporter(endpoint=collector_endpoint) simple_span_processor = SimpleSpanProcessor(span_exporter) trace.get_tracer_provider().add_span_processor(simple_span_processor)LBKStracer = trace.get_tracer(__name__)LBKS# Function to handle streaming response async def handle_streaming_response(url, headers, payload): async with httpx.AsyncClient() as client: with tracer.start_as_current_span("OpenAI Streaming Chat Completion") as span: response = await [client.post](http://client.post)(url, headers=headers, json=payload, stream=True) async for chunk in response.aiter_text(): # Process each chunk and set the llm.output_messages attribute span.set_attribute("llm.output_messages", chunk) print(chunk) # or handle the chunk as neededLBKS# Example usage url = "[https://api.openai.com/v1/chat/completions](https://api.openai.com/v1/chat/completions)" headers = { "Content-Type": "application/json", "Authorization": "Bearer YOUR_OPENAI_API_KEY" } payload = { "model": "gpt-4", "messages": [{"role": "user", "content": "Hello, how are you?"}], "stream": True }LBKSimport asyncio asyncio.run(handle_streaming_response(url, headers, payload))LBKSThis code sets up the tracer, creates a span for the OpenAI request, and handles the streaming response by updating the span with the llm.output_messages attribute as chunks of data are received.Would you like more details on any specific part of this process?Sources:
Not sure if the answer above is accurate, can we update spans like that? They should be immutable?
I'm not sure I understand what you're trying to do Hakan. Are you trying to augment the spans generated by the autoinstrumentation?
My main question is, if I have a streaming OpenAI request, do I need to change anything on my end when I am manually setting the llm output messages span attribute? Like will the full response be captured? We are doing manual instrumentation here and not using the OpenAIInstrumentor
My assumption is no, but wanted to confirm here
I think I understand. If you are manually instrumenting streaming, you will need to collect individually emitted tokens into a single response message, count tokens, etc.
The goal is basically to make the attributes look as though they were not streaming.
