Error with Logging Evals in AriZe AI: Plight Arrow Issue

bo · 2025-11-02T13:33:07.863Z

Hi in the ariZe ai while I was trying to log evals back to traces I am getting weird error in plight arrow logger and says cannot writespanrequest

6 comments

· Sorted by Oldest

RunLLM
APP
·
·
The error you're seeing when logging evals ("cannot writespanrequest" or similar pyarrow logger errors) typically occurs if the evaluation DataFrame is missing required fields—most importantly, the span_id column, which is needed to associate evaluations with traces. The span_id must be present and in the correct format (hex string, not integer or bytes) for Phoenix to log evaluations successfully. If you manually create the DataFrame, ensure span_id is included and properly formatted; otherwise, Phoenix cannot join the evals to the correct spans, leading to serialization or write errors in pyarrow or the logger. See <https://arize-ai.slack.com/archives/C04R3GXC8HK/p1721332908.765159|this Slack thread> and <https://arize.com/docs/phoenix/tracing/how-to-tracing/feedback-and-annotations/llm-evaluations|Phoenix evaluation docs> for details. Would you like more detail on how to extract or format span_id, or on troubleshooting pyarrow errors? Sources: - <https://arize-ai.slack.com/archives/C04R3GXC8HK/p1721332908.765159|https://arize-ai.slack.com/archives/C04R3GXC8HK/p1721332908.765159> - <https://arize.com/docs/phoenix/tracing/how-to-tracing/feedback-and-annotations/llm-evaluations|Phoenix evaluation docs>
💡 Hint: Mention RunLLM in the thread for followups.
bo
·
·
This is what I'm seeing any pointers will help RunLLM
Dat
External
·
·
bo can you copy and past the text here?
Priyan
External
·
·
bo some snippets from your code, particularly for how you ran the evals on traces, would be helpful. for now, i suggest validating your code against the following checklist:
1.
context.span_id should contain strings
2.
label columns (eval.<name>.label) should only contain strings, score columns (eval.<name>.score) should be int/floats, explanation should be strings
3.
manually cross check some context.span_id values with your actual traces in Phoenix UI.
there are a few more complicated cases you could be hitting but it's hard to diagnose without your code.
bo
·
·
def _log_to_arize_sync(self, eval_df: pd.DataFrame) -> bool: """ Synchronous logging to Arize using log_evaluations_sync API. Follows the pattern from Arize's trace-level-evals tutorial: arize_client.log_evaluations_sync(dataframe=log_df, model_id=model_id) """ try: # Arize Platform SDK: client.log_evaluations_sync() response = self.client.log_evaluations_sync( dataframe=eval_df, model_id=self.model_id ) # Check response status if response: logger.debug(f"Arize log_evaluations_sync successful: {response}") return True else: logger.error(f"Arize API returned status: {response.status_code}") return False except Exception as e: logger.error(f" Arize API error: {e}", exc_info=True) return False def _build_arize_evaluations_dataframe( self, trace_id: str, runtime_id: str, evaluation_result: Dict[str, Any], trace_metadata: Dict[str, Any] ) -> pd.DataFrame: """ Build DataFrame in Arize Platform trace-level evaluations format. Following the pattern from Arize's trace-level-evals tutorial: https://docs.arize.com/arize/llm-large-language-models/llm-traces/trace-evaluations Column naming pattern: trace_eval.{EvalName}.label and trace_eval.{EvalName}.explanation Example columns: - context.trace_id: The trace ID - context.span_id: The root span ID - trace_eval.StepBudgeting.label: "correct" or "incorrect" - trace_eval.StepBudgeting.explanation: Explanation text Creates ONE ROW per trace with all guardrail evaluations as columns. """ guardrail_results = evaluation_result.get("guardrail_results", []) span_id = trace_metadata.get("span_id") if not span_id: # Try alternative field names span_id = trace_metadata.get("root_span_id") if not span_id: span_id = trace_id[:16] if len(trace_id) > 16 else trace_id logger.warning(f" span_id not found in trace_metadata, using derived value: {span_id}") # Start with trace identifiers row_data = { "context.trace_id": trace_id, "context.span_id": span_id, } # Add each guardrail as separate columns (label + explanation) for gr in guardrail_results: guardrail_name = gr.get("guardrail_name", "unknown") status = gr.get("status", "unknown") # Map status to Arize label format (correct/incorrect) label = "correct" if status == "passed" else "incorrect" explanation = gr.get("message", "No details provided") # Convert guardrail_name to PascalCase for column name # e.g., "step_budgeting" -> "StepBudgeting" eval_name = "".join(word.capitalize() for word in guardrail_name.split("_")) # Add columns following Arize naming pattern row_data[f"trace_eval.{eval_name}.label"] = label row_data[f"trace_eval.{eval_name}.explanation"] = explanation # Create single-row DataFrame (one row per trace) df = pd.DataFrame([row_data]) logger.debug(f"Built Arize evaluations: {len(guardrail_results)} guardrails for trace {trace_id}") logger.debug(f" Columns: {list(df.columns)}") return df 🔒[private user] 🔒[private user] below is how arize client is created self.client = ArizeClient( space_id=self.space_id, api_key=self.api_key ) using https://app.arize.com/
bo
·
·
Testing SPAN-Level Evaluations ================================================================================ 1️⃣ Configuration: Space ID: API Key: Model ID: Span ID: 34da4809fa1bbca3 2️⃣ Creating SPAN-level evaluation DataFrame... Format: eval.{Name}.label ✅ DataFrame created: Shape: (1, 3) Columns: ['context.span_id', 'eval.StepBudgeting.label', 'eval.StepBudgeting.explanation'] 📋 Data preview: context.span_id eval.StepBudgeting.label eval.StepBudgeting.explanation 34da4809fa1bbca3 correct Agent used 5 out of 10 allowed steps - PASS 3️⃣ Initializing Arize Client... ✅ Client initialized 4️⃣ Logging evaluations to Arize... 📤 Calling log_evaluations_sync()... Error logging evaluation data to Arize Traceback (most recent call last): File "C:\Users\bbb\Desktop\G-sdk\ag-sdk\gs\Lib\site-packages\arize\pandas\logger.py", line 1810, in _log_arrow_flight with flight_writer: ^^^^^^^^^^^^^ File "pyarrow/ipc.pxi", line 630, in pyarrow.lib._CRecordBatchWriter.__exit__ File "pyarrow/_flight.pyx", line 1276, in pyarrow._flight.MetadataRecordBatchWriter.close File "pyarrow/_flight.pyx", line 55, in pyarrow._flight.check_flight_status pyarrow._flight.FlightInternalError: Flight returned internal error, with message: failed to write WriteSpanEvaluationRequest. gRPC client debug context: UNKNOWN:Error received from peer ipv4:34.117.96.240:443 {created_time:"2025-11-04T06:04:20.273324+00:00", grpc_status:13, grpc_message:"failed to write WriteSpanEvaluationRequest"}. Client context: OK 📥 Response: None ⚠️ Response is None Check Arize dashboard to verify if evaluations appeared

RunLLM
APP
·
·
The error you're seeing when logging evals ("cannot writespanrequest" or similar pyarrow logger errors) typically occurs if the evaluation DataFrame is missing required fields—most importantly, the span_id column, which is needed to associate evaluations with traces. The span_id must be present and in the correct format (hex string, not integer or bytes) for Phoenix to log evaluations successfully. If you manually create the DataFrame, ensure span_id is included and properly formatted; otherwise, Phoenix cannot join the evals to the correct spans, leading to serialization or write errors in pyarrow or the logger. See <https://arize-ai.slack.com/archives/C04R3GXC8HK/p1721332908.765159|this Slack thread> and <https://arize.com/docs/phoenix/tracing/how-to-tracing/feedback-and-annotations/llm-evaluations|Phoenix evaluation docs> for details. Would you like more detail on how to extract or format span_id, or on troubleshooting pyarrow errors? Sources: - <https://arize-ai.slack.com/archives/C04R3GXC8HK/p1721332908.765159|https://arize-ai.slack.com/archives/C04R3GXC8HK/p1721332908.765159> - <https://arize.com/docs/phoenix/tracing/how-to-tracing/feedback-and-annotations/llm-evaluations|Phoenix evaluation docs>
💡 Hint: Mention RunLLM in the thread for followups.
bo
·
·
This is what I'm seeing any pointers will help RunLLM
Dat
External
·
·
bo can you copy and past the text here?
Priyan
External
·
·
bo some snippets from your code, particularly for how you ran the evals on traces, would be helpful. for now, i suggest validating your code against the following checklist:
1.
context.span_id should contain strings
2.
label columns (eval.<name>.label) should only contain strings, score columns (eval.<name>.score) should be int/floats, explanation should be strings
3.
manually cross check some context.span_id values with your actual traces in Phoenix UI.
there are a few more complicated cases you could be hitting but it's hard to diagnose without your code.
bo
·
·
def _log_to_arize_sync(self, eval_df: pd.DataFrame) -> bool: """ Synchronous logging to Arize using log_evaluations_sync API. Follows the pattern from Arize's trace-level-evals tutorial: arize_client.log_evaluations_sync(dataframe=log_df, model_id=model_id) """ try: # Arize Platform SDK: client.log_evaluations_sync() response = self.client.log_evaluations_sync( dataframe=eval_df, model_id=self.model_id ) # Check response status if response: logger.debug(f"Arize log_evaluations_sync successful: {response}") return True else: logger.error(f"Arize API returned status: {response.status_code}") return False except Exception as e: logger.error(f" Arize API error: {e}", exc_info=True) return False def _build_arize_evaluations_dataframe( self, trace_id: str, runtime_id: str, evaluation_result: Dict[str, Any], trace_metadata: Dict[str, Any] ) -> pd.DataFrame: """ Build DataFrame in Arize Platform trace-level evaluations format. Following the pattern from Arize's trace-level-evals tutorial: https://docs.arize.com/arize/llm-large-language-models/llm-traces/trace-evaluations Column naming pattern: trace_eval.{EvalName}.label and trace_eval.{EvalName}.explanation Example columns: - context.trace_id: The trace ID - context.span_id: The root span ID - trace_eval.StepBudgeting.label: "correct" or "incorrect" - trace_eval.StepBudgeting.explanation: Explanation text Creates ONE ROW per trace with all guardrail evaluations as columns. """ guardrail_results = evaluation_result.get("guardrail_results", []) span_id = trace_metadata.get("span_id") if not span_id: # Try alternative field names span_id = trace_metadata.get("root_span_id") if not span_id: span_id = trace_id[:16] if len(trace_id) > 16 else trace_id logger.warning(f" span_id not found in trace_metadata, using derived value: {span_id}") # Start with trace identifiers row_data = { "context.trace_id": trace_id, "context.span_id": span_id, } # Add each guardrail as separate columns (label + explanation) for gr in guardrail_results: guardrail_name = gr.get("guardrail_name", "unknown") status = gr.get("status", "unknown") # Map status to Arize label format (correct/incorrect) label = "correct" if status == "passed" else "incorrect" explanation = gr.get("message", "No details provided") # Convert guardrail_name to PascalCase for column name # e.g., "step_budgeting" -> "StepBudgeting" eval_name = "".join(word.capitalize() for word in guardrail_name.split("_")) # Add columns following Arize naming pattern row_data[f"trace_eval.{eval_name}.label"] = label row_data[f"trace_eval.{eval_name}.explanation"] = explanation # Create single-row DataFrame (one row per trace) df = pd.DataFrame([row_data]) logger.debug(f"Built Arize evaluations: {len(guardrail_results)} guardrails for trace {trace_id}") logger.debug(f" Columns: {list(df.columns)}") return df 🔒[private user] 🔒[private user] below is how arize client is created self.client = ArizeClient( space_id=self.space_id, api_key=self.api_key ) using https://app.arize.com/
bo
·
·
Testing SPAN-Level Evaluations ================================================================================ 1️⃣ Configuration: Space ID: API Key: Model ID: Span ID: 34da4809fa1bbca3 2️⃣ Creating SPAN-level evaluation DataFrame... Format: eval.{Name}.label ✅ DataFrame created: Shape: (1, 3) Columns: ['context.span_id', 'eval.StepBudgeting.label', 'eval.StepBudgeting.explanation'] 📋 Data preview: context.span_id eval.StepBudgeting.label eval.StepBudgeting.explanation 34da4809fa1bbca3 correct Agent used 5 out of 10 allowed steps - PASS 3️⃣ Initializing Arize Client... ✅ Client initialized 4️⃣ Logging evaluations to Arize... 📤 Calling log_evaluations_sync()... Error logging evaluation data to Arize Traceback (most recent call last): File "C:\Users\bbb\Desktop\G-sdk\ag-sdk\gs\Lib\site-packages\arize\pandas\logger.py", line 1810, in _log_arrow_flight with flight_writer: ^^^^^^^^^^^^^ File "pyarrow/ipc.pxi", line 630, in pyarrow.lib._CRecordBatchWriter.__exit__ File "pyarrow/_flight.pyx", line 1276, in pyarrow._flight.MetadataRecordBatchWriter.close File "pyarrow/_flight.pyx", line 55, in pyarrow._flight.check_flight_status pyarrow._flight.FlightInternalError: Flight returned internal error, with message: failed to write WriteSpanEvaluationRequest. gRPC client debug context: UNKNOWN:Error received from peer ipv4:34.117.96.240:443 {created_time:"2025-11-04T06:04:20.273324+00:00", grpc_status:13, grpc_message:"failed to write WriteSpanEvaluationRequest"}. Client context: OK 📥 Response: None ⚠️ Response is None Check Arize dashboard to verify if evaluations appeared