Hello Team, I want to get specific set of data for annotation. It could be based on list of trace_id or attributes.session.id or any other filter How to get specific data? current solution --> Get all data based on start and end time and then apply filter on spans_df spans_df = client.spans.get_spans_dataframe( project_identifier="agentic_service_project", start_time=datetime.now() - timedelta(days=7), end_time=datetime.now(), timeout=120 ) issue --> I have so many data in project
It takes good amount of time to get all data
Error : httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)
needs to play with limit arg every time
is there any better and effective solution to this? Thanks
Yes, you can filter spans directly at query time using Phoenix's query DSL, which is more efficient than fetching all data and filtering locally. Use the SpanQuery class to specify filters such as trace_id in [...], attributes['session.id'] == ..., or any other attribute, and pass this query to client.query_spans() or client.spans.get_spans_dataframe() with your filter condition. This reduces data transfer and speeds up retrieval.Example for filtering by a list of trace IDs: LBKSfrom phoenix.trace.dsl import SpanQuery query = SpanQuery().where("trace_id in ['id1', 'id2', 'id3']") spans_df = client.query_spans(query, project_name="agentic_service_project")LBKSYou can also filter by attributes, e.g., attributes['session.id'] == 'your_session_id'. This approach avoids loading all data and lets you control the result size efficiently. Would you like more detail or examples for other filters?Sources:
- Extract Data from Spans
- SpanQuery DSL
- GitHub Issue: Filter by Span ID
💡 Hint: Mention RunLLM in the thread for followups.
Did runLLM help you solve your issue?
Yes
