Issue with phoenix_client.log_traces Method in Data Loading Process

Anuraag T. · 2024-07-08T09:23:15.780Z

Hi, I've run into an issue when loading the saved data with phoenix client. Here are the steps I followed: • Spin up the server using docker. • Added a few dummy entries with tracing. • Saved the dummy spans as dataframe(CSV) and default parquet. When I load the data and create a TraceDataset object it works perfectly, the issue is when I use phoenix_client.log_traces method, it shows me TypeError and ValueError everytime I load. Any kind of help is much appreciated, Thanks for considering.

Dustin N.
·
Hi Anuraag T., thank you for trying out Phoenix! Is there any chance you can show up a code snippet so we can help diagnose what's happening?

Anuraag T.

Sure Dustin N., Here is the code snippet I used in jupyter notebook. My Phoenix server is hosted on docker

import phoenix as px
import pandas as pd

pxc = px.Client(endpoint="http://127.0.0.1:6006")

# save traces as dataframe.
pxc.get_spans_dataframe(project_name="dummyDemo").to_csv("./demo1.csv", index=False)

# load the dataframe
df2 = pd.read_csv("./demo1.csv")

# create trace dataset from dataframe
my_traces2 = px.TraceDataset(dataframe=df2)

# log the traces to default project
# this line throws ValueError
pxc.log_traces(
    trace_dataset=my_traces2,
    project_name="default",
)

Dustin N.
·
any chance you can show the type error you're getting? Also just to verify, what version of Phoenix are you running?

Anuraag T.

Sure here is the traceback of the error. I used both version 4.5.0 and latest 4.7.1.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [61], line 1
----> 1 pxc.log_traces(
      2     trace_dataset=my_traces2,
      3     project_name="default",
      4 )
      6 # session = px.launch_app(trace=px.TraceDataset.load(trace_id, "./"))

File e:\Projects\py3\env\lib\site-packages\phoenix\session\client.py:275, in Client.log_traces(self, trace_dataset, project_name)
    273 project_name = project_name or get_env_project_name()
    274 spans = trace_dataset.to_spans()
--> 275 otlp_spans = [
    276     ExportTraceServiceRequest(
    277         resource_spans=[
    278             ResourceSpans(
    279                 resource=Resource(
    280                     attributes=[
    281                         KeyValue(
    282                             key="openinference.project.name",
    283                             value=AnyValue(string_value=project_name),
    284                         )
    285                     ]
    286                 ),
    287                 scope_spans=[ScopeSpans(spans=[encode_span_to_otlp(span)])],
    288             )
    289         ],
    290     )
    291     for span in spans
    292 ]
    293 for otlp_span in otlp_spans:
    294     serialized = otlp_span.SerializeToString()

File e:\Projects\py3\env\lib\site-packages\phoenix\session\client.py:275, in <listcomp>(.0)
    273 project_name = project_name or get_env_project_name()
    274 spans = trace_dataset.to_spans()
--> 275 otlp_spans = [
    276     ExportTraceServiceRequest(
    277         resource_spans=[
    278             ResourceSpans(
    279                 resource=Resource(
    280                     attributes=[
    281                         KeyValue(
    282                             key="openinference.project.name",
    283                             value=AnyValue(string_value=project_name),
    284                         )
    285                     ]
    286                 ),
    287                 scope_spans=[ScopeSpans(spans=[encode_span_to_otlp(span)])],
    288             )
    289         ],
    290     )
    291     for span in spans
    292 ]
    293 for otlp_span in otlp_spans:
    294     serialized = otlp_span.SerializeToString()

File e:\Projects\py3\env\lib\site-packages\phoenix\trace\trace_dataset.py:185, in TraceDataset.to_spans(self)
    183 if end_time is pd.NaT:
    184     end_time = None
--> 185 yield json_to_span(
    186     {
    187         "name": row["name"],
    188         "context": context,
    189         "span_kind": row["span_kind"],
    190         "parent_id": row.get("parent_id"),
    191         "start_time": cast(datetime, row["start_time"]).isoformat(),
    192         "end_time": end_time.isoformat() if end_time else None,
    193         "status_code": row["status_code"],
    194         "status_message": row.get("status_message") or "",
    195         "attributes": attributes,
    196         "events": row.get("events") or [],
    197         "conversation": row.get("conversation"),
    198     }
    199 )

File e:\Projects\py3\env\lib\site-packages\phoenix\trace\span_json_decoder.py:72, in json_to_span(data)
     70 data["span_kind"] = SpanKind(data["span_kind"])
     71 data["status_code"] = SpanStatusCode(data["status_code"])
---> 72 data["events"] = [
     73     SpanException(
     74         message=(event.get("attributes") or {}).get(EXCEPTION_MESSAGE) or "",
     75         timestamp=datetime.fromisoformat(event["timestamp"]),
     76     )
     77     if event["name"] == "exception"
     78     else SpanEvent(
     79         name=event["name"],
     80         attributes=event.get("attributes") or {},
     81         timestamp=datetime.fromisoformat(event["timestamp"]),
     82     )
     83     for event in data["events"]
     84 ]
     85 data["conversation"] = (
     86     SpanConversationAttributes(**data["conversation"])
     87     if data["conversation"] is not None
     88     else None
     89 )
     90 return Span(**data)

File e:\Projects\py3\env\lib\site-packages\phoenix\trace\span_json_decoder.py:77, in <listcomp>(.0)
     70 data["span_kind"] = SpanKind(data["span_kind"])
     71 data["status_code"] = SpanStatusCode(data["status_code"])
     72 data["events"] = [
     73     SpanException(
     74         message=(event.get("attributes") or {}).get(EXCEPTION_MESSAGE) or "",
     75         timestamp=datetime.fromisoformat(event["timestamp"]),
     76     )
---> 77     if event["name"] == "exception"
     78     else SpanEvent(
     79         name=event["name"],
     80         attributes=event.get("attributes") or {},
     81         timestamp=datetime.fromisoformat(event["timestamp"]),
     82     )
     83     for event in data["events"]
     84 ]
     85 data["conversation"] = (
     86     SpanConversationAttributes(**data["conversation"])
     87     if data["conversation"] is not None
     88     else None
     89 )
     90 return Span(**data)

TypeError: string indices must be integers

Dustin N.
·
I see, thanks for the report, this definitely seems like a bug
Dustin N.
·
a couple things: the log_traces method was intended to be a stopgap before we supported persistence for Phoenix, is it sufficient for your use case to simply log them to Phoenix the first time and rely on persistence to keep them around? 2. If this codepath is necessary, would you mind filing a bug report? We can prioritize the issue and fix it as soon as we can
Anuraag T.
·
In my case, I'm using docker to run the server so I can't use px.launch_app to re-run the server, so for me working of log_traces is crucial. Can you tell me the steps to file the bug report?
Dustin N.
·
Hi Anuraag T.! You can file issues on Github repo: https://github.com/Arize-ai/phoenix/issues
Dustin N.
·
Even if you are using Docker, you can configure Phoenix to use either a sqlite or postgres database with environment variables https://docs.arize.com/phoenix/deployment/docker
Dustin N.
·
In the example docker compose we point to a postgres database, so data can be persisted between phoenix sessions
Anuraag T.
·
Yes, I'm using the docker deployment with sqlite as persistence db.
Dustin N.
·
is there a reason you aren't seeing traces persisted between deployments?
Anuraag T.
·
No, I'm seeing the traces on the deployment, its just that I can't log traces to different projects.
Dustin N.
·
traces can be logged to different projects either by setting the resource or by using the using_project context manager, the latter is mainly intended for use in notebooks but if you aren't doing any other instrumentation you can try it to see if it works
Anuraag T.
·
Sure, will do. Thanks.

Issue with phoenix_client.log_traces Method in Data Loading Process

24 comments

Issue with phoenix_client.log_traces Method in Data Loading Process

24 comments