Optimizing Trace Management for Short-Lived Lambda Jobs in Phoenix

Ben M. · 2025-11-11T16:15:13.645Z

If I need to ensure all traces make their way to phoenix for short-lived lambda jobs configured in batch mode using phoenix.otel to call tracer_provider.force_flush()? Is there a better way to do this?

9 comments

· Sorted by Oldest

RunLLM
APP
·
·
Yes, for short-lived jobs (like AWS Lambda) using batch mode with phoenix.otel, you should call tracer_provider.force_flush() before the process exits to ensure all spans are exported. This is the recommended approach, as batch processors buffer spans and may not export them immediately unless flushed manually in short-lived environments. There is no better built-in alternative documented for this scenario—manual flushing is necessary to avoid losing traces on process exit.For reference, see the Spring AI Java example and LangChain4j Java example, which both explicitly call forceFlush() on the tracer provider before shutdown to guarantee all traces are sent. The same principle applies to Python and other languages using Phoenix with OpenTelemetry batch span processors.Would you like more detail or a code example?Sources: - Spring AI Java example - LangChain4j Java example
💡 Hint: Mention RunLLM in the thread for followups.
Roger Y.
·
·
force_flush is the only method i’m aware of that can flush the buffer
Ben M.
·
·
Thanks, Roger. I’ll probably just end up hooking into atexit to force flush. https://docs.python.org/3/library/atexit.html
Ben M.
·
·
Does it make sense for that to be part of phoenix.otel? I understand if not, but am open to submitting a PR if you think it would be more widely beneficial 🙂
Xander S.
·
·
atexit feels like the right way to handle this. Documentation on how to handle this could be helpful, though.
👍1
Roger Y.
·
·
fwiw, ChatGPT 5 seems to be against using atexit for lambdas
Ben M.
·
·
Thanks for the heads up 👍
Roger Y.
·
·
a more detailed explanation from Claude Why atexit Doesn’t Work in Lambda 1. Lambda Freezes, Not Exits Lambda processes are not shut down but frozen once the function finishes. Since the process is still running but frozen, using shutdown() or relying on process exit handlers is inappropriate. When Lambda finishes executing your handler:
The process doesn’t terminate - it freezes
atexit handlers only run on process exit
Your Lambda container may be reused (warm start) for the next invocation
The frozen process can sit idle for minutes/hours before eventual termination
2. atexit Timing is Unpredictable Even if the Lambda container eventually terminates:
It could be hours or days later
By then, your telemetry data is stale and useless
Network connections may have timed out
The collector endpoint may have changed
3. Warm Starts Break atexit Logic
import atexit from opentelemetry import trace # This is WRONG for Lambda atexit.register(lambda: trace.get_tracer_provider().shutdown()) def lambda_handler(event, context): # First invocation: container starts, atexit registered # Handler runs, container FREEZES (atexit doesn't run) # Second invocation: same container UNFREEZES # Handler runs again, container FREEZES again (atexit still doesn't run) # Eventually: container terminates, atexit runs (too late!) pass
4. You Need Immediate Flushing Lambda requires synchronous, immediate flushing after each invocation: WRONG - Using atexit:
import atexit from opentelemetry import trace provider = TracerProvider() atexit.register(provider.shutdown) # Won't run until process exit def lambda_handler(event, context): # Spans created here won't be flushed until container dies with tracer.start_as_current_span("work"): do_work() return {"statusCode": 200} # Container freezes, spans lost
CORRECT - Explicit flush:
from opentelemetry import trace provider = TracerProvider() trace.set_tracer_provider(provider) def lambda_handler(event, context): try: with tracer.start_as_current_span("work"): do_work() finally: # Flush immediately before handler returns provider.force_flush(timeout_millis=5000) return {"statusCode": 200}
When atexit WOULD Work (But You Still Shouldn’t Use It) The only scenario where atexit might help is for the very last invocation before container termination, but:
1.
You don’t know which invocation is the last one
2.
All previous invocations would lose their spans
3.
You still need explicit flushing anyway
The Right Approach Always use explicit force_flush() in the handler’s finally block:
def lambda_handler(event, context): try: # Your logic pass finally: # Guaranteed to run before Lambda freezes trace.get_tracer_provider().force_flush(timeout_millis=5000)
This ensures spans are exported immediately after each invocation, regardless of whether the container is reused or terminated.
Ben M.
·
·
I will end up doing something like that, but it might be neat if we augmented the tracer provider with a context manager. Something like this
with tracer_provider.flush_context(): # call models, etc
Then when the context is exited, it can automatically force flush.

RunLLM
APP
·
·
Yes, for short-lived jobs (like AWS Lambda) using batch mode with phoenix.otel, you should call tracer_provider.force_flush() before the process exits to ensure all spans are exported. This is the recommended approach, as batch processors buffer spans and may not export them immediately unless flushed manually in short-lived environments. There is no better built-in alternative documented for this scenario—manual flushing is necessary to avoid losing traces on process exit.For reference, see the Spring AI Java example and LangChain4j Java example, which both explicitly call forceFlush() on the tracer provider before shutdown to guarantee all traces are sent. The same principle applies to Python and other languages using Phoenix with OpenTelemetry batch span processors.Would you like more detail or a code example?Sources: - Spring AI Java example - LangChain4j Java example
💡 Hint: Mention RunLLM in the thread for followups.
Roger Y.
·
·
force_flush is the only method i’m aware of that can flush the buffer
Ben M.
·
·
Thanks, Roger. I’ll probably just end up hooking into atexit to force flush. https://docs.python.org/3/library/atexit.html
Ben M.
·
·
Does it make sense for that to be part of phoenix.otel? I understand if not, but am open to submitting a PR if you think it would be more widely beneficial 🙂
Xander S.
·
·
atexit feels like the right way to handle this. Documentation on how to handle this could be helpful, though.
👍1
Roger Y.
·
·
fwiw, ChatGPT 5 seems to be against using atexit for lambdas
Ben M.
·
·
Thanks for the heads up 👍
Roger Y.
·
·
a more detailed explanation from Claude Why atexit Doesn’t Work in Lambda 1. Lambda Freezes, Not Exits Lambda processes are not shut down but frozen once the function finishes. Since the process is still running but frozen, using shutdown() or relying on process exit handlers is inappropriate. When Lambda finishes executing your handler:
The process doesn’t terminate - it freezes
atexit handlers only run on process exit
Your Lambda container may be reused (warm start) for the next invocation
The frozen process can sit idle for minutes/hours before eventual termination
2. atexit Timing is Unpredictable Even if the Lambda container eventually terminates:
It could be hours or days later
By then, your telemetry data is stale and useless
Network connections may have timed out
The collector endpoint may have changed
3. Warm Starts Break atexit Logic
import atexit from opentelemetry import trace # This is WRONG for Lambda atexit.register(lambda: trace.get_tracer_provider().shutdown()) def lambda_handler(event, context): # First invocation: container starts, atexit registered # Handler runs, container FREEZES (atexit doesn't run) # Second invocation: same container UNFREEZES # Handler runs again, container FREEZES again (atexit still doesn't run) # Eventually: container terminates, atexit runs (too late!) pass
4. You Need Immediate Flushing Lambda requires synchronous, immediate flushing after each invocation: WRONG - Using atexit:
import atexit from opentelemetry import trace provider = TracerProvider() atexit.register(provider.shutdown) # Won't run until process exit def lambda_handler(event, context): # Spans created here won't be flushed until container dies with tracer.start_as_current_span("work"): do_work() return {"statusCode": 200} # Container freezes, spans lost
CORRECT - Explicit flush:
from opentelemetry import trace provider = TracerProvider() trace.set_tracer_provider(provider) def lambda_handler(event, context): try: with tracer.start_as_current_span("work"): do_work() finally: # Flush immediately before handler returns provider.force_flush(timeout_millis=5000) return {"statusCode": 200}
When atexit WOULD Work (But You Still Shouldn’t Use It) The only scenario where atexit might help is for the very last invocation before container termination, but:
1.
You don’t know which invocation is the last one
2.
All previous invocations would lose their spans
3.
You still need explicit flushing anyway
The Right Approach Always use explicit force_flush() in the handler’s finally block:
def lambda_handler(event, context): try: # Your logic pass finally: # Guaranteed to run before Lambda freezes trace.get_tracer_provider().force_flush(timeout_millis=5000)
This ensures spans are exported immediately after each invocation, regardless of whether the container is reused or terminated.
Ben M.
·
·
I will end up doing something like that, but it might be neat if we augmented the tracer provider with a context manager. Something like this
with tracer_provider.flush_context(): # call models, etc
Then when the context is exited, it can automatically force flush.