I am having some issues with evaluations.
I have my own server that stores traces in a database
When I get a request to view traces, I filter spans and start a new phoenix session feeding the TraceDataset in the session.
session = px.launch_app(trace=trace_ds)I created a custom eval (copied from Bring your own evaluator page). While I can run the evals, problems start with displaying the results
Using this code, does not works at all:
px.Client().log_evaluations(
SpanEvaluations(eval_name="bro", dataframe=relevance_classifications)
)I have been able to show the eval results with this:
dataset = px.active_session().get_trace_dataset()
dataset.append_evaluations(
SpanEvaluations(eval_name="bro", dataframe=relevance_classifications)
)
px.close_app()
session = px.launch_app(trace=dataset)
session.view()But with this, when I rerun evaluations, it fails at px.active_session().get_trace_dataset()
Thank you so much for trying out Phoenix Nabeegh A. ! Is there any chance you can show us a little bit of what your data looks like so we can reproduce what’s going on?
Dustin N. My data looks exactly like the default phoenix schema, with some added fields. This is how I construct a dataset:
def construct_traces_dataset(traces: list[dict]):
def process_trace(trace: dict):
trace["start_time"] = str(trace["start_time"])
trace["end_time"] = str(trace["end_time"])
for event in trace["events"]:
event["timestamp"] = str(event["timestamp"])
return trace
return TraceDataset(
json_lines_to_df([json.dumps(process_trace(trace)) for trace in traces])
)also, I'd like to point out that with phoenix version 4.0.0 and above, we support persistence, so you won't have to keep on restarting your app
can you provide a small code snippet that fails so we can reproduce?
