Troubleshooting RESOURCE_EXHAUSTED Errors in Phoenix Tracing

·Sep 17, 2025 09:27 AM·

Hi Everyone, I'm getting continuous RESOURCE_EXHAUSTED errors when using Phoenix for tracing with a LangGraph application. - Phoenix auto-instrumentation with batch=True - Each agent request generates multiple spans (custom spans + LangChain auto-instrumentation + LangGraph execution spans). The problem: I see constant warnings like "Transient error StatusCode.RESOURCE_EXHAUSTED encountered while exporting traces to localhost:4317, retrying in 1s/2s/4s/8s/16s/32s..." has anyone else seen this issue? - Are there recommended rate limiting or batching configurations for high-volume applications? - Is there a way to sample traces (e.g., only send every 10th trace) to reduce load(maybe?)? - Can I filter out certain types of spans before they're sent to Phoenix?

3 comments

· Sorted by Oldest

RunLLM
APP
·
·
The RESOURCE_EXHAUSTED error is due to the OpenTelemetry collector or Phoenix backend being unable to keep up with the rate of span ingestion, often because the batch processor queue fills up or the database (e.g., Postgres) is slow to insert spans. You can tune batch processor settings (batch size, queue size, export timeout) to help, but there is no built-in rate limiter or sampling configuration in Phoenix itself. To filter or sample spans, you can implement a custom SpanProcessor to drop or modify spans before export, as shown in the docs and examples below. Sampling every Nth trace is not natively supported, but can be implemented in a custom processor. To filter out certain spans, subclass BatchSpanProcessor and override on_start/on_end to skip spans matching your criteria (e.g., by name or attribute). Example: ``` from phoenix.otel import register, BatchSpanProcessor from opentelemetry.context import Context from opentelemetry.sdk.trace import ReadableSpan, Span class FilteringSpanProcessor(BatchSpanProcessor): def _filter_condition(self, span: Span) -> bool: # Example: filter out spans named "secret_span" return span.name == "secret_span" def on_start(self, span: Span, parent_context: Context) -> None: if self._filter_condition(span): return super().on_start(span, parent_context) def on_end(self, span: ReadableSpan) -> None: if self._filter_condition(span): return super().on_end(span) tracer_provider = register(batch=True) tracer_provider.add_span_processor(FilteringSpanProcessor(endpoint="http://localhost:6006/v1/traces", protocol="http/protobuf")) ``` For more, see <https://arize.com/docs/phoenix/tracing/how-to-tracing/advanced/modifying-spans|Phoenix advanced span filtering> and <https://github.com/Arize-ai/phoenix/issues/6869|batch processor tuning>. Would you like more detail on sampling or performance tuning? Sources: - <https://github.com/Arize-ai/phoenix/issues/9461|Batch processor tuning and OOM issues> - <https://arize.com/docs/phoenix/tracing/how-to-tracing/advanced/modifying-spans|Phoenix advanced span filtering>
💡 Hint: Mention RunLLM in the thread for followups.
Xander S.
·
·
Hi GGhiasi, what version of Phoenix are you running?
Roger Y.
·
·
You can try setting the following environment variable to a large value to increase the queue size inside the sever
PHOENIX_MAX_SPANS_QUEUE_SIZE=1000000

RunLLM
APP
·
·
The RESOURCE_EXHAUSTED error is due to the OpenTelemetry collector or Phoenix backend being unable to keep up with the rate of span ingestion, often because the batch processor queue fills up or the database (e.g., Postgres) is slow to insert spans. You can tune batch processor settings (batch size, queue size, export timeout) to help, but there is no built-in rate limiter or sampling configuration in Phoenix itself. To filter or sample spans, you can implement a custom SpanProcessor to drop or modify spans before export, as shown in the docs and examples below. Sampling every Nth trace is not natively supported, but can be implemented in a custom processor. To filter out certain spans, subclass BatchSpanProcessor and override on_start/on_end to skip spans matching your criteria (e.g., by name or attribute). Example: ``` from phoenix.otel import register, BatchSpanProcessor from opentelemetry.context import Context from opentelemetry.sdk.trace import ReadableSpan, Span class FilteringSpanProcessor(BatchSpanProcessor): def _filter_condition(self, span: Span) -> bool: # Example: filter out spans named "secret_span" return span.name == "secret_span" def on_start(self, span: Span, parent_context: Context) -> None: if self._filter_condition(span): return super().on_start(span, parent_context) def on_end(self, span: ReadableSpan) -> None: if self._filter_condition(span): return super().on_end(span) tracer_provider = register(batch=True) tracer_provider.add_span_processor(FilteringSpanProcessor(endpoint="http://localhost:6006/v1/traces", protocol="http/protobuf")) ``` For more, see <https://arize.com/docs/phoenix/tracing/how-to-tracing/advanced/modifying-spans|Phoenix advanced span filtering> and <https://github.com/Arize-ai/phoenix/issues/6869|batch processor tuning>. Would you like more detail on sampling or performance tuning? Sources: - <https://github.com/Arize-ai/phoenix/issues/9461|Batch processor tuning and OOM issues> - <https://arize.com/docs/phoenix/tracing/how-to-tracing/advanced/modifying-spans|Phoenix advanced span filtering>
💡 Hint: Mention RunLLM in the thread for followups.
Xander S.
·
·
Hi GGhiasi, what version of Phoenix are you running?
Roger Y.
·
·
You can try setting the following environment variable to a large value to increase the queue size inside the sever
PHOENIX_MAX_SPANS_QUEUE_SIZE=1000000