Hey Kyle J., you can view the evaluator task logs by going into 'Evals & Tasks' and clicking 'View Logs'. I am debugging the issue and I see the failed evals give this explanation:
qa: The evaluation label is NOT_PARSABLE for 1 spans, which may be due to one or more of the following issues: 1. "Enable Function Calling" is disabled in the UI, so labels are not extracted correctly and snapped to rails. Enable Function Calling to resolve this. 2. The max tokens setting is too low, cutting off the LLM's output during the explanation before generating the label. Increase max tokens or toggle off explanations on the task to fix this. 3. Both rails appear in the explanation, confusing the parsing logic. Update the prompt to encourage the LLM to mention only one rail. For spans with ids: ea5e66d60a902932