Troubleshooting Issues with px.Client().trace_dataset() in Python Scripts

Pooja R.

·Apr 03, 2024 03:38 PM

Hello, I'm a new user and running into a couple of issues and would love some clarification/help:

1.
Calling px.Client().trace_dataset().save() from within the python script that's running the LLM calls/instrumentor returns empty. My hope was to save all traces automatically once the process completes since data on Phoenix is not persistent atm. Is there a way to do this that's not manual?
2.
Not all of the traces are showing up in the UI. I am forcing the python script to run using a single thread to isolate the issue and manually logging (which gets the right # of traces). Is there another way to isolate this issue--instrumentor vs. trace provider?

7 comments

· Sorted by Oldest

Mikyo
·
Pooja R. Hey on the first point persistence is still currently what we are working on but for now in the notebook you can save and load datasets using these commands https://docs.arize.com/phoenix/tracing/how-to-tracing/save-and-load-traces On two, can you provide a bit more details like the version you are running, what instrumentors you have configured, where you've configured it etc? Any details you can give us on your setup will help us isolate what's going on.
Pooja R.
·
Is there a way to save and load datasets that's not in a notebook? I have the Phoenix server running in a docker container (v3.18.0) and my prediction pipeline is a standalone python script and px.Client(endpoint='http://localhost:6006').get_trace_dataset() returns empty. Sorry, the instrumertor is Langchain - openinference.instrumentation.langchain (0.1.14). I have the instrumentor intialized as the first thing before any other initialization.
resource = Resource(attributes={ ResourceAttributes.PROJECT_NAME: project_name }) endpoint = "http://127.0.0.1:6006/v1/traces" tracer_provider = trace_sdk.TracerProvider(resource=resource) span_exporter = OTLPSpanExporter(endpoint) span_processor = SimpleSpanProcessor(span_exporter=span_exporter) tracer_provider.add_span_processor(span_processor=span_processor) trace_api.set_tracer_provider(tracer_provider) LangChainInstrumentor().instrument()
Pooja R.
·
Also, please ignore #2. It's on our end.
Pooja R.
·
I would still like to find a way around #1
Mikyo
·
Pooja R. unfortunately you can only save traces via the client. We are full steam ahead on making persistence possible via containers but right now the save() method is all we have. Sorry about that. Will update you as soon as it's possible.
Pooja R.
·
No problem! Just so I'm not missing something, I can call save() via a client only in a notebook/ipython environment?
Mikyo
·
So save will place your data in the PHOENIX_WORKING_DIR (typically ~/.phoenix). If you want to save it, you can just grab the pandas dataframes via TraceDataset.get_dataframe() and save that os something like parquet. You can always view them by wrapping those in a TraceDataset(dataframe=df) later. A bit awkward I know but it's the best we've got right now.

Mikyo
·
Pooja R. Hey on the first point persistence is still currently what we are working on but for now in the notebook you can save and load datasets using these commands https://docs.arize.com/phoenix/tracing/how-to-tracing/save-and-load-traces On two, can you provide a bit more details like the version you are running, what instrumentors you have configured, where you've configured it etc? Any details you can give us on your setup will help us isolate what's going on.
Pooja R.
·
Is there a way to save and load datasets that's not in a notebook? I have the Phoenix server running in a docker container (v3.18.0) and my prediction pipeline is a standalone python script and px.Client(endpoint='http://localhost:6006').get_trace_dataset() returns empty. Sorry, the instrumertor is Langchain - openinference.instrumentation.langchain (0.1.14). I have the instrumentor intialized as the first thing before any other initialization.
resource = Resource(attributes={ ResourceAttributes.PROJECT_NAME: project_name }) endpoint = "http://127.0.0.1:6006/v1/traces" tracer_provider = trace_sdk.TracerProvider(resource=resource) span_exporter = OTLPSpanExporter(endpoint) span_processor = SimpleSpanProcessor(span_exporter=span_exporter) tracer_provider.add_span_processor(span_processor=span_processor) trace_api.set_tracer_provider(tracer_provider) LangChainInstrumentor().instrument()
Pooja R.
·
Also, please ignore #2. It's on our end.
Pooja R.
·
I would still like to find a way around #1
Mikyo
·
Pooja R. unfortunately you can only save traces via the client. We are full steam ahead on making persistence possible via containers but right now the save() method is all we have. Sorry about that. Will update you as soon as it's possible.
Pooja R.
·
No problem! Just so I'm not missing something, I can call save() via a client only in a notebook/ipython environment?
Mikyo
·
So save will place your data in the PHOENIX_WORKING_DIR (typically ~/.phoenix). If you want to save it, you can just grab the pandas dataframes via TraceDataset.get_dataframe() and save that os something like parquet. You can always view them by wrapping those in a TraceDataset(dataframe=df) later. A bit awkward I know but it's the best we've got right now.