How to View Full Eval Explanation in Slack

Kyle J. · 2025-12-07T22:13:05.219Z

Second question: it seems the eval explanation gets trimmed, is there a way to see the full value?

9 comments

· Sorted by Oldest

Elizabeth H.
·
·
Kyle J. this is also likely related to how the LLM response is parsed when not using tool-calling. You should be able to set use_function_calling_if_available=True when you run llm_classify with your evaluator. What model are you using?

Duncan M.

Hey Kyle J., you can view the evaluator task logs by going into 'Evals & Tasks' and clicking 'View Logs'. I am debugging the issue and I see the failed evals give this explanation:

qa: The evaluation label is NOT_PARSABLE for 1 spans, which may be due to one or more of the following issues: 1. "Enable Function Calling" is disabled in the UI, so labels are not extracted correctly and snapped to rails. Enable Function Calling to resolve this. 2. The max tokens setting is too low, cutting off the LLM's output during the explanation before generating the label. Increase max tokens or toggle off explanations on the task to fix this. 3. Both rails appear in the explanation, confusing the parsing logic. Update the prompt to encourage the LLM to mention only one rail. For spans with ids: ea5e66d60a902932

Kyle J.
·
·
🔒[private user] - is this what you are looking for with llm_classify?
Kyle J.
·
·
Hi Duncan M. - I've seen that for NOT_PARSABLE evals, but this one was "incomplete" and I would like to see the full answer in this view:
Duncan M.
·
·
Thanks Kyle, I don't think there's a way you can view these - it looks like even the api response is truncated. I'll see if I can find the full message somewhere though.
1
Kyle J.
·
·
Hi 🔒[private user] any more suggestions? We are using gemini.2.5-flash internally to generate the output based on the input, so I would assume it should say that it was correct?
Kyle J.
·
·
Yes, I wanted to use the same LLM to prove that I have arize set up correctly for evaluation purposes. Then once proven, I would switch to another llm to get 'second opinion'. I believe the QA template should be able to evaluate properly. As of right now, I would say it is not working well. Is it possible to get a consultation session?
Kyle J.
·
·
I'll message direct
Kyle J.
·
·
thanks!

Elizabeth H.
·
·
Kyle J. this is also likely related to how the LLM response is parsed when not using tool-calling. You should be able to set use_function_calling_if_available=True when you run llm_classify with your evaluator. What model are you using?

Duncan M.

Hey Kyle J., you can view the evaluator task logs by going into 'Evals & Tasks' and clicking 'View Logs'. I am debugging the issue and I see the failed evals give this explanation:

qa: The evaluation label is NOT_PARSABLE for 1 spans, which may be due to one or more of the following issues: 1. "Enable Function Calling" is disabled in the UI, so labels are not extracted correctly and snapped to rails. Enable Function Calling to resolve this. 2. The max tokens setting is too low, cutting off the LLM's output during the explanation before generating the label. Increase max tokens or toggle off explanations on the task to fix this. 3. Both rails appear in the explanation, confusing the parsing logic. Update the prompt to encourage the LLM to mention only one rail. For spans with ids: ea5e66d60a902932

Kyle J.
·
·
🔒[private user] - is this what you are looking for with llm_classify?
Kyle J.
·
·
Hi Duncan M. - I've seen that for NOT_PARSABLE evals, but this one was "incomplete" and I would like to see the full answer in this view:
Duncan M.
·
·
Thanks Kyle, I don't think there's a way you can view these - it looks like even the api response is truncated. I'll see if I can find the full message somewhere though.
1
Kyle J.
·
·
Hi 🔒[private user] any more suggestions? We are using gemini.2.5-flash internally to generate the output based on the input, so I would assume it should say that it was correct?
Kyle J.
·
·
Yes, I wanted to use the same LLM to prove that I have arize set up correctly for evaluation purposes. Then once proven, I would switch to another llm to get 'second opinion'. I believe the QA template should be able to evaluate properly. As of right now, I would say it is not working well. Is it possible to get a consultation session?
Kyle J.
·
·
I'll message direct
Kyle J.
·
·
thanks!