Second question: it seems the eval explanation gets trimmed, is there a way to see the full value?
Kyle J. this is also likely related to how the LLM response is parsed when not using tool-calling. You should be able to set use_function_calling_if_available=True when you run llm_classify with your evaluator. What model are you using?
Hey Kyle J., you can view the evaluator task logs by going into 'Evals & Tasks' and clicking 'View Logs'. I am debugging the issue and I see the failed evals give this explanation:
qa: The evaluation label is NOT_PARSABLE for 1 spans, which may be due to one or more of the following issues: 1. "Enable Function Calling" is disabled in the UI, so labels are not extracted correctly and snapped to rails. Enable Function Calling to resolve this. 2. The max tokens setting is too low, cutting off the LLM's output during the explanation before generating the label. Increase max tokens or toggle off explanations on the task to fix this. 3. Both rails appear in the explanation, confusing the parsing logic. Update the prompt to encourage the LLM to mention only one rail. For spans with ids: ea5e66d60a902932Hi 馃敀[private user] any more suggestions? We are using gemini.2.5-flash internally to generate the output based on the input, so I would assume it should say that it was correct?
Yes, I wanted to use the same LLM to prove that I have arize set up correctly for evaluation purposes. Then once proven, I would switch to another llm to get 'second opinion'. I believe the QA template should be able to evaluate properly. As of right now, I would say it is not working well. Is it possible to get a consultation session?
I'll message direct
thanks!
