Understanding Querying for Nested Span Data

·Apr 15, 2025 08:00 PM

I'm trying to understand how the querying works. I have a trace like this screenshot. With a query like SpanQuery().where("name == 'Suggested - Analyzer'") I get the root span which has input properties and output but I'm not able to retrieve the input prompt or token counts/model/any of that. That's all in the 2nd span. How would you get to that?

6 comments

· Sorted by Oldest

RunLLM
APP
·
To retrieve specific attributes from child spans, you can use the `SpanQuery` to filter and select the desired attributes. In your case, if the input prompt, token counts, and model information are in the second span, you need to adjust your query to target that span specifically. You can use the `parent_id` to join the child span with the root span and extract the necessary attributes. Here's a general approach: 1. *Query the Root Span*: Use `SpanQuery` to filter for the root span and select its attributes. 2. *Query the Child Span*: Use another `SpanQuery` to filter for the child span using the `parent_id` and select the desired attributes. 3. *Join the DataFrames*: Use pandas to join the two DataFrames on the `span_id` or `parent_id`. Here's a code snippet to illustrate this: ``` from phoenix.trace.dsl import SpanQuery import pandas as pd # Query for the root span query_root = SpanQuery().where("name == 'Suggested - Analyzer'").select( input="input.value", output="output.value" ) # Query for the child span query_child = SpanQuery().where("span_kind == 'LLM'").select( span_id="parent_id", model="attributes.llm.model", token_count="attributes.llm.token_count.total" ) # Execute the queries root_df = px.Client().query_spans(query_root) child_df = px.Client().query_spans(query_child) # Join the DataFrames joined_df = pd.concat([root_df, child_df], axis=1, join="inner") print(joined_df) ``` This approach assumes that the child span is of kind 'LLM' and contains the attributes you are interested in. Adjust the `where` clause and `select` parameters as needed to match your specific trace structure. Would you like more detail on any part of this process? Sources: - <https://github.com/Arize-ai/phoenix/issues/5652|https://github.com/Arize-ai/phoenix/issues/5652> - <https://github.com/Arize-ai/phoenix/pull/6584|https://github.com/Arize-ai/phoenix/pull/6584>
💡 Hint: Mention RunLLM in the thread for followups.
Peter K.
·
Yea I'm looking for a spanquery to get them out programmatically

Peter K.

Just some basic code i've been playing with that isn't getting me there 😅

query = SpanQuery().where("name == 'Suggested - Analyzer'")

spans = px.Client().query_spans(query, project_name='prod', timeout=60, limit=5)

for row in spans.iterrows():
    print(row)

Peter K.
·
Hrmm I don't think either of those is ideal. We're doing about 10,000 traces an hour and rapidly growing so loading them all likely won't work for us. That does give me an idea though. I should be able to pull the top level spans as I am now and then query for spans that have that parent id right?
Peter K.
·
Awesome! Ok I'll give that a shot. Thanks!
:np_blue:1
Peter K.
·
Awesome! That works. This should get me pretty far. I'll be back when I go through our agentic workflows I'm sure 😅. Thanks again!

RunLLM
APP
·
To retrieve specific attributes from child spans, you can use the `SpanQuery` to filter and select the desired attributes. In your case, if the input prompt, token counts, and model information are in the second span, you need to adjust your query to target that span specifically. You can use the `parent_id` to join the child span with the root span and extract the necessary attributes. Here's a general approach: 1. *Query the Root Span*: Use `SpanQuery` to filter for the root span and select its attributes. 2. *Query the Child Span*: Use another `SpanQuery` to filter for the child span using the `parent_id` and select the desired attributes. 3. *Join the DataFrames*: Use pandas to join the two DataFrames on the `span_id` or `parent_id`. Here's a code snippet to illustrate this: ``` from phoenix.trace.dsl import SpanQuery import pandas as pd # Query for the root span query_root = SpanQuery().where("name == 'Suggested - Analyzer'").select( input="input.value", output="output.value" ) # Query for the child span query_child = SpanQuery().where("span_kind == 'LLM'").select( span_id="parent_id", model="attributes.llm.model", token_count="attributes.llm.token_count.total" ) # Execute the queries root_df = px.Client().query_spans(query_root) child_df = px.Client().query_spans(query_child) # Join the DataFrames joined_df = pd.concat([root_df, child_df], axis=1, join="inner") print(joined_df) ``` This approach assumes that the child span is of kind 'LLM' and contains the attributes you are interested in. Adjust the `where` clause and `select` parameters as needed to match your specific trace structure. Would you like more detail on any part of this process? Sources: - <https://github.com/Arize-ai/phoenix/issues/5652|https://github.com/Arize-ai/phoenix/issues/5652> - <https://github.com/Arize-ai/phoenix/pull/6584|https://github.com/Arize-ai/phoenix/pull/6584>
💡 Hint: Mention RunLLM in the thread for followups.
Peter K.
·
Yea I'm looking for a spanquery to get them out programmatically

Peter K.

Just some basic code i've been playing with that isn't getting me there 😅

query = SpanQuery().where("name == 'Suggested - Analyzer'")

spans = px.Client().query_spans(query, project_name='prod', timeout=60, limit=5)

for row in spans.iterrows():
    print(row)

Peter K.
·
Hrmm I don't think either of those is ideal. We're doing about 10,000 traces an hour and rapidly growing so loading them all likely won't work for us. That does give me an idea though. I should be able to pull the top level spans as I am now and then query for spans that have that parent id right?
Peter K.
·
Awesome! Ok I'll give that a shot. Thanks!
:np_blue:1
Peter K.
·
Awesome! That works. This should get me pretty far. I'll be back when I go through our agentic workflows I'm sure 😅. Thanks again!