I think in my examples it's because the llm didn't strictly follow the response format. so the label was parsed as part of the reasoning.
this was end part of the prompt
Example response:
************
EXPLANATION: An explanation of your reasoning for why the label is "relevant" or "unrelated"
LABEL: "relevant" or "unrelated"
************
EXPLANATION:
it seems LLM some times don't start `LABEL: ` on a new line