any thoughts on storing actual vectors in phoenix datasets or traces? is that explicitly discouraged? are they captured in traces by default RunLLM
Storing actual vectors (embeddings) in Phoenix datasets is supported and common鈥擯hoenix datasets can include embedding columns, and these are visualized and analyzed for drift, clustering, and retrieval performance. For traces, embeddings can be included as span attributes, but whether they are captured by default depends on your instrumentation: some frameworks (e.g., LlamaIndex, LangChain) log embeddings automatically, while with manual tracing you must explicitly add them as attributes. There is no explicit discouragement of storing vectors in either datasets or traces, but for traces, you may want to mask or limit embedding data for performance or privacy reasons, especially with large vectors or high-volume workloads. If you want to avoid logging embeddings in traces, you can mask or exclude them using Phoenix's span attribute masking features. For more, see EmbeddingColumnNames in datasets, customizing traces, and masking span attributes. Would you like more detail or code examples?Sources: - EmbeddingColumnNames in datasets - Masking Span Attributes
馃挕聽Hint: Mention RunLLM in the thread for followups.
Hey Andrew S. - this is an interesting question as I lean towards it being less important than the text itself. Storing the vectors if the embeddings are dense does unlock some really interesting EDA flows via dimensionality reduction and clustering - but it does require a pretty savvy DS persona. It can be interesting at volume however.
I was mostly curious for the purpose of
faster experiments
improving performance of phoenix鈥檚 web UI
Searching for traces in the web UI is super slow and I was wondering if dropping embeddings from traces would speed it up.
Anything to reduce the payloads could help. So setting OPENINFERENCE_HIDE_EMBEDDING_VECTORS might help a bit. https://arize-ai.github.io/openinference/spec/configuration.html
