Hi - This is probably explained somewhere that I haven't found yet, but I'm wondering which types of llm operations will appear in my active Phoenix session, and which will not be automatically sent. For instance, I'm using LlamaIndex, and when I generate questions from a corpus as follows, my session remains unchanged, even though I'm calling the llm model for each question: questions_df = llm_generate( dataframe=document_chunks_df, template=generate_questions_template, model=OpenAIModel( model_name="gpt-3.5-turbo-instruct", ), output_parser=output_parser, ) ------------------------------------------------ However, when I answer each question with the same llm model, the Phoenix session actively records this info: # loop over the questions and generate the answers for _, row in questions_with_document_chunk_df.iterrows(): question = row["question"] response_vector = query_engine.query(question) print(f"Question: {question}\nAnswer: {response_vector.response}\n") -------------------------------------------------- Is this because in the second case I assigned the llm earlier in my code, as part of the query_engine definition, as opposed to explicitly calling out the llm model as in the first case? Or am I just confusing the types of llm operations that Phoenix tracks versus the ones it ignores? Thanks, David
Hey David K., good question. TLDR, in order for trace data to appear in Phoenix, the model or application has to be instrumented with an OpenInference tracer. In the first example using llm_generate, the OpenAIModel (an instance of phoenix.experimental.evals.OpenAIModel) is not instrumented by default, so we don't expect any traces or spans from those LLM calls to appear inside Phoenix. As a side note, it's possible to instrument the openai Python SDK client that is wrapped by phoenix.experimental.evals.OpenAIModel, in which case, you would see traces in Phoenix for these calls. You can give it a try with:
from phoenix.trace.exporter import HttpExporter
from phoenix.trace.openai import OpenAIInstrumentor
tracer = Tracer(exporter=HttpExporter())
OpenAIInstrumentor(tracer).instrument()In the second case, you are using a LlamaIndex application that has been instrumented. There are multiple ways to instrument a LlamaIndex application; the most common and recommended way uses a call to set_global_handler("arize_phoenix"). This causes your LlamaIndex application to submit OpenInference traces corresponding to the various events within the application (embeddings, retrievals, LLM calls, etc.). It sounds like you might be passing the same phoenix.experimental.evals.OpenAIModel instance you used for llm_generate into your LlamaIndex application. If so, it's actually a coincidence that your LlamaIndex application is still working. LlamaIndex has their own set of LLM classes (e.g., from llama_index.llms import OpenAI, see here), and we would definitely recommended using those model classes rather than the Phoenix eval model classes when you're running your LlamaIndex application. Hope that helps!
Hi Xander -- FYI, I tried the tracer code below and it didn't generate any trace in my Phoenix session. (Note I added "from phoenix.trace.tracer import Tracer", as it was needed for proper operation): import json from phoenix.experimental.evals import OpenAIModel, llm_generate ### The following code should allow these generated questions to be traced by Phoenix from phoenix.trace.exporter import HttpExporter from phoenix.trace.openai import OpenAIInstrumentor from phoenix.trace.tracer import Tracer tracer = Tracer(exporter=HttpExporter()) OpenAIInstrumentor(tracer).instrument() ### def output_parser(response: str, index: int): try: return json.loads(response) except json.JSONDecodeError as e: return {"__error__": str(e)} questions_df = llm_generate( dataframe=document_chunks_df, template=generate_questions_template, model=OpenAIModel( model_name="gpt-3.5-turbo-instruct", ), output_parser=output_parser, ) Regards,
Give this notebook a go 馃檪
Thanks, Xander. This code works for me. However, it also illustrates the issue I've been seeing:
I run with gpt-4-1106-preview, and Phoenix creates the traces properly
Then I run with gpt-3.5-turbo-instruct, and output_df is still created, but Phoenix doesn't add any traces
Then I run again with gpt-4-1106-preview, and Phoenix adds 3 more traces properly
I don't understand why Phoenix doesn't recognize gpt-3.5-turbo-instruct in this context, but it does in some other contexts, where it has been successfully generating traces for me.
I should note that when I "run again", I'm not creating a new session. I'm just running the code block beginning with "dataframe = pd.DataFrame(" again.
Hey David K., that's a good callout. At this time, we only instrument the OpenAI chat completions API. gpt-3.5-turbo-instruct uses the legacy completions API, which is set for deprecation. If that particular model is important for your work, feel free to file an enhancement ticket on our GitHub!
We have a new instrumentor that can do this, but it鈥檚 still in beta. You can give a try using the code below.
pip install openai openinference-instrumentation-openai opentelemetry-sdk opentelemetry-exporter-otlpReplace the old instrumentor with the code below. This will capture and send the spans to Phoenix
from openinference.instrumentation.openai import OpenAIInstrumentor
from opentelemetry import trace as trace_api
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
resource = Resource(attributes={})
tracer_provider = trace_sdk.TracerProvider(resource=resource)
span_exporter = OTLPSpanExporter(endpoint="http://127.0.0.1:6006/v1/traces")
span_processor = SimpleSpanProcessor(span_exporter=span_exporter)
tracer_provider.add_span_processor(span_processor=span_processor)
trace_api.set_tracer_provider(tracer_provider=tracer_provider)
OpenAIInstrumentor().instrument()
Thanks, Roger - I'll definitely try it out! Is the intent that this will also work for non-OpenAI models? I'm particularly interested in llama-cpp, but it would be good to know more generally. Xander -- Regarding the legacy Completions API being phased out, it looks like GPT-3.5-turbo-instruct is actually the newer model that OpenAI points to for people using those legacy Completions models: https://openai.com/blog/gpt-4-api-general-availability So I think the intent is for GPT-3.5-turbo-instruct to be around for a while. Regards, David
