How to reduce NOT_PARSABLE responses during relevance evaluation? I'm using gpt-4o and most of the time llm give the label at the end of reasoning like LABEL: relevant
Xiaohan W. The llm_classify has a "verbose" flag. Can you set that to True and look at the generated output? If its ok to share would be helpful to see. Its trying to find your rail values "incorrect" or "correct" in the strings returned by the LLM so "...Incorrect!" rails to "incorrect" Not parsable means the expected outputs "rails" are not in the string. Common issue is your template asks for "right" or "wrong" but you set the rails to "correct" or incorrect"
The other option to get a little more information is add the, provide_explanation = True It will give some indication of the LLM reasoning
I think in my examples it's because the llm didn't strictly follow the response format. so the label was parsed as part of the reasoning. this was end part of the prompt
Example response:
************
EXPLANATION: An explanation of your reasoning for why the label is "relevant" or "unrelated"
LABEL: "relevant" or "unrelated"
************
EXPLANATION:it seems LLM some times don't start `LABEL: ` on a new line
The LLM your using support function calls?
We don鈥檛 support eval explanations using function calling on all the latest models who rolled out support, but knowing what people are using is helpful.
