Data Ingestion Tracing with Arize and Phoenix for RAG

Sebastian S.

·May 16, 2025 10:41 PM

What does Arize and or Phoenix offer in terms of tracing data ingestion for documents for use in RAG

1 comment

· Sorted by Oldest

RunLLM
APP
·
Phoenix provides detailed tracing for data ingestion and document retrieval in RAG pipelines. It captures spans for each stage (loading, indexing, storing, querying) and records which documents are retrieved for each query, including their content and metadata, making it possible to introspect and evaluate retrieval quality directly from traces. This enables users to visualize, filter, and analyze the flow of documents through the RAG system, and to attach evaluations (e.g., relevance) at the document or span level for further analysis and troubleshooting. For custom RAG solutions, manual instrumentation is supported to log traces for any ingestion or retrieval step (<https://docs.arize.com/phoenix/tracing/how-to-tracing/manual-instrumentation/custom-spans|docs>). Arize and Phoenix also allow exporting and importing of document ingestion and retrieval data, supporting both DataFrame and OpenTelemetry-based ingestion. This makes it easy to benchmark, compare, and experiment with different ingestion strategies, chunking, and retrieval parameters, and to track changes over time (<https://github.com/Arize-ai/phoenix/blob/main/docs/how-to/define-your-schema/retrieval-rag.md|retrieval RAG schema>, <https://github.com/Arize-ai/phoenix/blob/main/docs/tracing/use-cases-tracing/rag-evaluation.md|RAG evaluation>). Would you like more detail on how to set up tracing or view ingestion metrics? Sources: - <https://github.com/Arize-ai/phoenix/blob/main/docs/tracing/use-cases-tracing/rag-evaluation.md|RAG evaluation with Phoenix> - <https://github.com/Arize-ai/phoenix/blob/main/docs/how-to/define-your-schema/retrieval-rag.md|Retrieval (RAG) schema and ingestion>
💡 Hint: Mention RunLLM in the thread for followups.
🔥1