Guidance Needed on Manual Instrumentation for Azure OpenAI Integration

·Aug 02, 2024 08:51 PM

Hi Phoenix team, Quick question on manual instrumentation. We are doing REST API calls to OpenAI instead of using openai sdk in Python, so we are thinking of doing the manual instrumentation route and adding the right attributes. Our team had a thought today that we are looking for your guidance on. How, if at all, would us sending information over to Arize via the OTLP endpoint change if we had streaming responses from Azure OpenAI and we are manually setting the llm output messages attribute?

10 comments

· Sorted by Oldest

RunLLM
APP
·
To manually instrument REST API calls to OpenAI and handle streaming responses, you need to ensure that the spans are correctly created and the attributes are set appropriately. The process involves setting up a tracer, creating spans for each part of the interaction, and adding the necessary attributes to these spans.
For streaming responses, you will need to handle the streaming data within the span and set the llm.output_messages attribute incrementally as you receive chunks of data. Here’s a basic outline of how you can achieve this:
1. Setup Tracer: Initialize the tracer and span processor to send data to the OTLP endpoint. 2. Create Span: Start a span for the OpenAI request. 3. Handle Streaming: As you receive chunks of data from the streaming response, update the span with the llm.output_messages attribute.
Here is a code snippet to illustrate this:
```
from urllib.parse import urljoin from opentelemetry import trace from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter from opentelemetry.sdk.resources import Resource from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import SimpleSpanProcessor import httpx
resource = Resource(attributes={"service.name": "your-service-name"}) tracer_provider = TracerProvider(resource=resource) trace.set_tracer_provider(tracer_provider)
collector_endpoint = urljoin("http://your-collector-host:your-collector-port", "v1/traces") span_exporter = OTLPSpanExporter(endpoint=collector_endpoint) simple_span_processor = SimpleSpanProcessor(span_exporter) trace.get_tracer_provider().add_span_processor(simple_span_processor)
tracer = trace.get_tracer(name)
async def handle_streaming_response(url, headers, payload): async with httpx.AsyncClient() as client: with tracer.start_as_current_span("OpenAI Streaming Chat Completion") as span: response = await client.post(url, headers=headers, json=payload, stream=True) async for chunk in response.aiter_text(): # Process each chunk and set the llm.output_messages attribute span.set_attribute("llm.output_messages", chunk) print(chunk) # or handle the chunk as needed
url = "https://api.openai.com/v1/chat/completions" headers = { "Content-Type": "application/json", "Authorization": "Bearer YOUR_OPENAI_API_KEY" } payload = { "model": "gpt-4", "messages": [{"role": "user", "content": "Hello, how are you?"}], "stream": True }
import asyncio asyncio.run(handle_streaming_response(url, headers, payload))
LBKSThis code sets up the tracer, creates a span for the OpenAI request, and handles the streaming response by updating the span with the `llm.output_messages` attribute as chunks of data are received. Would you like more details on any specific part of this process? Sources:
RunLLM
APP
·
- Manual Instrumentation Example - Streaming LLM Requests
💡 Hint: Mention RunLLM in the thread for followups.
Hakan T.
·
Not sure if the answer above is accurate, can we update spans like that? They should be immutable?
Xander S.
·
I have no idea.
😅1
Xander S.
·
I'm not sure I understand what you're trying to do Hakan. Are you trying to augment the spans generated by the autoinstrumentation?
Hakan T.
·
My main question is, if I have a streaming OpenAI request, do I need to change anything on my end when I am manually setting the llm output messages span attribute? Like will the full response be captured? We are doing manual instrumentation here and not using the OpenAIInstrumentor
Hakan T.
·
My assumption is no, but wanted to confirm here
Xander S.
·
I think I understand. If you are manually instrumenting streaming, you will need to collect individually emitted tokens into a single response message, count tokens, etc.
Hakan T.
·
okay yes, makes sense, just need to make sure the final response is there before I set the attribute, thank you!
🙌1
Xander S.
·
The goal is basically to make the attributes look as though they were not streaming.

RunLLM
APP
·
To manually instrument REST API calls to OpenAI and handle streaming responses, you need to ensure that the spans are correctly created and the attributes are set appropriately. The process involves setting up a tracer, creating spans for each part of the interaction, and adding the necessary attributes to these spans.
For streaming responses, you will need to handle the streaming data within the span and set the llm.output_messages attribute incrementally as you receive chunks of data. Here’s a basic outline of how you can achieve this:
1. Setup Tracer: Initialize the tracer and span processor to send data to the OTLP endpoint. 2. Create Span: Start a span for the OpenAI request. 3. Handle Streaming: As you receive chunks of data from the streaming response, update the span with the llm.output_messages attribute.
Here is a code snippet to illustrate this:
```
from urllib.parse import urljoin from opentelemetry import trace from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter from opentelemetry.sdk.resources import Resource from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import SimpleSpanProcessor import httpx
resource = Resource(attributes={"service.name": "your-service-name"}) tracer_provider = TracerProvider(resource=resource) trace.set_tracer_provider(tracer_provider)
collector_endpoint = urljoin("http://your-collector-host:your-collector-port", "v1/traces") span_exporter = OTLPSpanExporter(endpoint=collector_endpoint) simple_span_processor = SimpleSpanProcessor(span_exporter) trace.get_tracer_provider().add_span_processor(simple_span_processor)
tracer = trace.get_tracer(name)
async def handle_streaming_response(url, headers, payload): async with httpx.AsyncClient() as client: with tracer.start_as_current_span("OpenAI Streaming Chat Completion") as span: response = await client.post(url, headers=headers, json=payload, stream=True) async for chunk in response.aiter_text(): # Process each chunk and set the llm.output_messages attribute span.set_attribute("llm.output_messages", chunk) print(chunk) # or handle the chunk as needed
url = "https://api.openai.com/v1/chat/completions" headers = { "Content-Type": "application/json", "Authorization": "Bearer YOUR_OPENAI_API_KEY" } payload = { "model": "gpt-4", "messages": [{"role": "user", "content": "Hello, how are you?"}], "stream": True }
import asyncio asyncio.run(handle_streaming_response(url, headers, payload))
LBKSThis code sets up the tracer, creates a span for the OpenAI request, and handles the streaming response by updating the span with the `llm.output_messages` attribute as chunks of data are received. Would you like more details on any specific part of this process? Sources:
RunLLM
APP
·
- Manual Instrumentation Example - Streaming LLM Requests
💡 Hint: Mention RunLLM in the thread for followups.
Hakan T.
·
Not sure if the answer above is accurate, can we update spans like that? They should be immutable?
Xander S.
·
I have no idea.
😅1
Xander S.
·
I'm not sure I understand what you're trying to do Hakan. Are you trying to augment the spans generated by the autoinstrumentation?
Hakan T.
·
My main question is, if I have a streaming OpenAI request, do I need to change anything on my end when I am manually setting the llm output messages span attribute? Like will the full response be captured? We are doing manual instrumentation here and not using the OpenAIInstrumentor
Hakan T.
·
My assumption is no, but wanted to confirm here
Xander S.
·
I think I understand. If you are manually instrumenting streaming, you will need to collect individually emitted tokens into a single response message, count tokens, etc.
Hakan T.
·
okay yes, makes sense, just need to make sure the final response is there before I set the attribute, thank you!
🙌1
Xander S.
·
The goal is basically to make the attributes look as though they were not streaming.