Also a question on evaluations. I'm using Vertex AI's batch processing directly with some advanced features, this writes a file to GCS. So my workflow is batch process to caption a list of images followed by batch process with llm as judge on the output captions. I have multiple llm-as-judge evaluations I run. What is the best practice for importing these into Arize? Is there a way to incrementally add "columns" to a dataset or do I need to wait until all processing is done to upload? What about when I tune my llm-as-judge prompt? Still trying to grok some of the paradigms and integrate with my current processes.
You can incrementally add new evaluations (as "columns") to a Phoenix dataset by logging multiple evaluation DataFrames, either in separate calls or by concatenating them and uploading at once. This allows you to add results from different LLM-as-judge runs as they become available, without waiting for all processing to finish. When you tune your LLM-as-judge prompt and re-run evaluations, you can log the new results as additional annotations or columns, preserving previous results for comparison and tracking improvements over time. For best practice, use the Phoenix client鈥檚 log_span_annotations_dataframe or log_document_annotations_dataframe methods to upload each evaluation DataFrame as you generate them. If you want to upload all at once, concatenate DataFrames with annotation_name and annotator_kind columns. This approach supports iterative, experiment-driven workflows and is compatible with batch outputs from Vertex AI or other sources. See code examples and details in the official documentation and guides for logging multiple evaluation DataFrames and managing incremental updates to datasets (source 1, source 2).Would you like a step-by-step code example for this workflow?Sources: - Logging Multiple Evaluation DataFrames - Logging Multiple Evaluation DataFrames (GitHub)
馃挕聽Hint: Mention RunLLM in the thread for followups.
moved to Arize AX Support
