Error with Local Model Using Evals API: Tool Call Requirement
I am getting a strange error when using a local open source model (In this case Qwen3-30B) for evals with the new Evals API. It is working normally with OpenAI/Azure models. The model I used here is pretty capable and has no problem to do tool calling in other contexts. Apparently, I get the message that there need to be tool calls in the response - otherwise it does not work. Is this connected to generating formatted output? The error occurs consistently. But, I can reduce the occurences of the error down to approx. 10% if I had something like "Call any tool" in the prompt. As this is not really a gentle solution, I wanted to ask if anyone is aware of this issue. This is my setup, roughly:
model_name = os.getenv("MODEL_NAME_JUDGE")
api_key = os.getenv("OPENAI_KEY_JUDGE", "open")
base_url = os.getenv("OPENAI_BASE_URL")
eval_model = LLM(api_key=api_key, base_url=base_url, model=model_name, provider="openai")
print(eval_model.generate_classification(
prompt="Test prompt.",
labels=["correct", "not_correct"],
include_explanation=False))
File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\experiments\functions.py", line 790, in async_evaluate_run
result = await evaluator.async_evaluate(...)
File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\experiments\evaluators\base.py", line 77, in async_evaluate
return self.evaluate(output=output, **kwargs)
File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\experiments\evaluators\utils.py", line 215, in evaluate
result = func(*bound_signature.args, **bound_signature.kwargs)
File "C:\Users\bizis01\Desktop\request-to-mdx\src\text_to_mdx\experimentation\tests\correctness.py", line 48, in correctness
response = eval_model.generate_classification(
prompt="Test prompt.",
labels=["correct", "not_correct"],
include_explanation=False
)
File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\llm\wrapper.py", line 288, in generate_classification
result = self.generate_object(prompt, schema, **kwargs)
File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\tracing.py", line 153, in _wrapper_sync
result = func(*args, **kwargs)
File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\llm\wrapper.py", line 246, in generate_object
return rate_limited_generate(prompt, schema, **kwargs)
File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\rate_limiters.py", line 218, in wrapper
return fn(*args, **kwargs)
File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\llm\adapters\openai\adapter.py", line 163, in generate_object
return self._generate_with_tool_calling(prompt, schema, **kwargs)
File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\llm\adapters\openai\adapter.py", line 264, in _generate_with_tool_calling
raise ValueError("No tool calls in response")
ValueError: No tool calls in response
The above exception was the direct cause of the following exception:
RuntimeError: evaluator failed for example id 'RGF0YXNldEV4YW1wbGU6OQ==', repetition 1