I'm curious: when you have a deeply-nested trace of spans (Mastra Workflows, in my case) is there any deduplication done on the inputs and outputs? If you have not-small context in or results out (10s of kb) and they are identical "on the way in" and/or "on the way out", is there anything done to dedup all that data? Also, since Postgres is the persistence layer, are TOASTing our rows when the amount of data is >8kb?
It seems likely that Phoenix does not perform automatic deduplication of identical input/output data across deeply-nested spans; each span's attributes (including large context or results) are stored as-is, even if repeated in the trace. There is no mention of deduplication logic for span input/output in the available documentation or code discussions for Mastra or Phoenix in general. If you have repeated large payloads, they will be stored for each span that includes them, potentially increasing storage usage. Would you like more detail on how to manage or mitigate this?Regarding Postgres, my understanding is that Phoenix relies on Postgres' built-in TOAST (The Oversized-Attribute Storage Technique) for handling large row data. When a row exceeds 8KB, Postgres automatically TOASTs the data, storing large values out-of-line and keeping a pointer in the main table, so you do not need to configure this separately for Phoenix. This is standard Postgres behavior and is not altered by Phoenix's persistence layer configuration. Would you like more technical detail on how TOAST works or how Phoenix interacts with it?Sources: - https://github.com/Arize-ai/phoenix/pull/5250
馃挕聽Hint: Mention RunLLM in the thread for followups.
What are the best practices for moderate-volume production use?
you can try deploying a collector that transforms the span, e.g. delete the attribute when the span is not root, or if the span matches a certain name, and the re-export the span.
processors:
transform:
error_mode: ignore
trace_statements:
- context: span
statements:
- delete_key(span.attributes, "input.value")
where IsRootSpan() == falseand yes, large spans will be TOASTed
