Hi ssap to rails is not working properly in llm_classify for me. I am getting NOT_PARSABLE for the label most of the times Initially my labels were correct and incorrect and I saw an issue here classify fails if labels are present in the content or explanation. SO I changed the prompt to not generate any explanation and a single word output either green or red even then the label is NOT_PARSABLE As expected the output generated is just one word but labelling still fails
If your llm_classify is returning NOT_PARSABLE even when the LLM output is a single word (e.g., "green" or "red"), the most likely causes are: (1) the output does not exactly match the expected rails (case, whitespace, punctuation), or (2) the rails list passed to llm_classify does not match the possible outputs. The snap_to_rails function is strict: it only returns a label if exactly one rail is found in the output, and returns NOT_PARSABLE if none or multiple rails are found鈥攅ven if the output is a substring or has extra characters (see docs, source).To debug, set verbose=True in llm_classify to print the LLM output and see how it is being parsed (example). Also, ensure your rails are exactly ["green", "red"] and the LLM output is precisely "green" or "red" (no quotes, no extra whitespace, no punctuation). If the output is correct and you still get NOT_PARSABLE, please share a sample output and rails for further troubleshooting. Would you like more detail or a code example?Sources:
- Phoenix "Bring Your Own Evaluator" docs
- PR: fix rails code to be able to handle overlapping rails
- PR: improve logging for llm_eval_binary
馃挕聽Hint: Mention RunLLM in the thread for followups.
But response is only one token?
My prompt is also a few tokens only
if you set provide_explanations to True you can see what the output of the LLM is
can you test to see what label is failing to snap?
it is true but in my prompt i have mentioned that response should be a single word
This is my prompt:
LOGIC_LLM_JUDGE_PROMPT = """
In this task, you will be presented with 2 scripts one in ruby and one in typescript. Check if they achieve the same goal functionally
Your response should be a single word: either "green" or "red" and it should not include any other
text or characters. "green" indicates that the 2 scripts are same functionally. "red" indicates they do not accomplish the same goal
[BEGIN DATA]
Ruby Script:
{{{ruby_script}}}
Typescript Script:
{{{ts_script}}}
[END DATA]
LABEL: "green" or "red"
"""Classification template:
template = ClassificationTemplate(
rails = ['green', 'red'],
template = LOGIC_LLM_JUDGE_PROMPT,
delimiters = ("{{{", "}}}"),
)Eval:
with suppress_tracing():
logic_eval = llm_classify(
dataframe = code_gen_df,
template = template,
rails = ['green', 'red'],
model=LiteLLMModel(model="ollama/llama3.2:latest"),
provide_explanation=True,
include_prompt=True,
)can you show what the LLM is generating?
Thats the in the image i uploaded
Just one word red
so snapping just does simple string matching鈥攕trange question, can you copy and paste the output and check for equivalence with red manually?
how do you mean, did not get you
you posted a screenshot of the output, can you copy and paste that string in the screenshot and check that it's equal to red
like check for white spaces ?
