🔒[private user] Thanks for the explanation. One clarification: each evaluation task is scheduled to run every 2–10 minutes, so individual tasks are not long-running by design.
Given that, it seems the context deadline exceeded is more related to span volume and processing throughput during each run rather than task duration itself.
We’ll double-check that we’re fully using the Batch Span Processor (no SimpleSpanProcessor anywhere), and review batch size, export delay, and timeouts.
Question: is there a recommended upper bound of spans per evaluation run or best practice for splitting evaluations by time window / filtering spans in high-throughput production environments to avoid these deadlines?