Troubleshooting NOT_PARSABLE Labels in LLM Classify Process

Aditya T. · 2025-05-09T16:21:08.633Z

Hi ssap to rails is not working properly in llm_classify for me. I am getting NOT_PARSABLE for the label most of the times Initially my labels were correct and incorrect and I saw an issue here classify fails if labels are present in the content or explanation. SO I changed the prompt to not generate any explanation and a single word output either green or red even then the label is NOT_PARSABLE As expected the output generated is just one word but labelling still fails

RunLLM
APP
·
If your `llm_classify` is returning `NOT_PARSABLE` even when the LLM output is a single word (e.g., "green" or "red"), the most likely causes are: (1) the output does not exactly match the expected rails (case, whitespace, punctuation), or (2) the rails list passed to `llm_classify` does not match the possible outputs. The `snap_to_rails` function is strict: it only returns a label if exactly one rail is found in the output, and returns `NOT_PARSABLE` if none or multiple rails are found—even if the output is a substring or has extra characters (<https://docs.arize.com/phoenix/evaluation/how-to-evals/bring-your-own-evaluator#S%3A3|see docs>, <https://github.com/Arize-ai/phoenix/pull/1373|source>). To debug, set `verbose=True` in `llm_classify` to print the LLM output and see how it is being parsed (<https://github.com/Arize-ai/phoenix/pull/1333|example>). Also, ensure your rails are exactly `["green", "red"]` and the LLM output is precisely "green" or "red" (no quotes, no extra whitespace, no punctuation). If the output is correct and you still get `NOT_PARSABLE`, please share a sample output and rails for further troubleshooting. Would you like more detail or a code example? Sources: - <https://docs.arize.com/phoenix/evaluation/how-to-evals/bring-your-own-evaluator#S%3A3|Phoenix "Bring Your Own Evaluator" docs> - <https://github.com/Arize-ai/phoenix/pull/1373|PR: fix rails code to be able to handle overlapping rails> - <https://github.com/Arize-ai/phoenix/pull/1333|PR: improve logging for llm_eval_binary>
💡 Hint: Mention RunLLM in the thread for followups.
Dustin N.
·
hi Aditya T. that might be because your max_tokens is too low, preventing the LLM from returning the label since we ask for the annotations first
Aditya T.
·
But response is only one token?
Aditya T.
·
My prompt is also a few tokens only
Dustin N.
·
if you set provide_explanations to True you can see what the output of the LLM is
Dustin N.
·
can you test to see what label is failing to snap?
Aditya T.
·
it is true but in my prompt i have mentioned that response should be a single word

Aditya T.

This is my prompt:

LOGIC_LLM_JUDGE_PROMPT = """
In this task, you will be presented with 2 scripts one in ruby and one in typescript. Check if they achieve the same goal functionally

Your response should be a single word: either "green" or "red" and it should not include any other 
text or characters. "green" indicates that the 2 scripts are same functionally. "red" indicates they do not accomplish the same goal


[BEGIN DATA]
Ruby Script: 
{{{ruby_script}}}

Typescript Script: 
{{{ts_script}}}
[END DATA]

LABEL: "green" or "red"
"""

Classification template:

template = ClassificationTemplate(
    rails = ['green', 'red'],
    template = LOGIC_LLM_JUDGE_PROMPT,
    delimiters = ("{{{", "}}}"),
)

Eval:

with suppress_tracing():
    logic_eval = llm_classify(
        dataframe = code_gen_df,
        template = template,
        rails = ['green', 'red'],
        model=LiteLLMModel(model="ollama/llama3.2:latest"),
        provide_explanation=True,
        include_prompt=True,
    )

Dustin N.
·
can you show what the LLM is generating?
Aditya T.
·
Thats the in the image i uploaded
Aditya T.
·
Just one word red
Dustin N.
·
so snapping just does simple string matching—strange question, can you copy and paste the output and check for equivalence with red manually?
Aditya T.
·
how do you mean, did not get you
Dustin N.
·
you posted a screenshot of the output, can you copy and paste that string in the screenshot and check that it's equal to red
Aditya T.
·
like check for white spaces ?

Troubleshooting NOT_PARSABLE Labels in LLM Classify Process

57 comments

Troubleshooting NOT_PARSABLE Labels in LLM Classify Process

57 comments