Error with Local Model Using Evals API: Tool Call Requirement

·Nov 06, 2025 08:34 AM·

I am getting a strange error when using a local open source model (In this case Qwen3-30B) for evals with the new Evals API. It is working normally with OpenAI/Azure models. The model I used here is pretty capable and has no problem to do tool calling in other contexts. Apparently, I get the message that there need to be tool calls in the response - otherwise it does not work. Is this connected to generating formatted output? The error occurs consistently. But, I can reduce the occurences of the error down to approx. 10% if I had something like "Call any tool" in the prompt. As this is not really a gentle solution, I wanted to ask if anyone is aware of this issue. This is my setup, roughly:

model_name = os.getenv("MODEL_NAME_JUDGE")
api_key = os.getenv("OPENAI_KEY_JUDGE", "open")
base_url = os.getenv("OPENAI_BASE_URL")

eval_model = LLM(api_key=api_key, base_url=base_url, model=model_name, provider="openai")

print(eval_model.generate_classification(
        prompt="Test prompt.",
        labels=["correct", "not_correct"],
        include_explanation=False))

  File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\experiments\functions.py", line 790, in async_evaluate_run
    result = await evaluator.async_evaluate(...)
  File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\experiments\evaluators\base.py", line 77, in async_evaluate
    return self.evaluate(output=output, **kwargs)
  File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\experiments\evaluators\utils.py", line 215, in evaluate
    result = func(*bound_signature.args, **bound_signature.kwargs)
  File "C:\Users\bizis01\Desktop\request-to-mdx\src\text_to_mdx\experimentation\tests\correctness.py", line 48, in correctness
    response = eval_model.generate_classification(
        prompt="Test prompt.",
        labels=["correct", "not_correct"],
        include_explanation=False
    )
  File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\llm\wrapper.py", line 288, in generate_classification
    result = self.generate_object(prompt, schema, **kwargs)
  File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\tracing.py", line 153, in _wrapper_sync
    result = func(*args, **kwargs)
  File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\llm\wrapper.py", line 246, in generate_object
    return rate_limited_generate(prompt, schema, **kwargs)
  File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\rate_limiters.py", line 218, in wrapper
    return fn(*args, **kwargs)
  File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\llm\adapters\openai\adapter.py", line 163, in generate_object
    return self._generate_with_tool_calling(prompt, schema, **kwargs)
  File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\llm\adapters\openai\adapter.py", line 264, in _generate_with_tool_calling
    raise ValueError("No tool calls in response")

ValueError: No tool calls in response

The above exception was the direct cause of the following exception:
RuntimeError: evaluator failed for example id 'RGF0YXNldEV4YW1wbGU6OQ==', repetition 1

model_name = os.getenv("MODEL_NAME_JUDGE") api_key = os.getenv("OPENAI_KEY_JUDGE", "open") base_url = os.getenv("OPENAI_BASE_URL") eval_model = LLM(api_key=api_key, base_url=base_url, model=model_name, provider="openai") print(eval_model.generate_classification( prompt="Test prompt.", labels=["correct", "not_correct"], include_explanation=False))

File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\experiments\functions.py", line 790, in async_evaluate_run result = await evaluator.async_evaluate(...) File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\experiments\evaluators\base.py", line 77, in async_evaluate return self.evaluate(output=output, **kwargs) File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\experiments\evaluators\utils.py", line 215, in evaluate result = func(*bound_signature.args, **bound_signature.kwargs) File "C:\Users\bizis01\Desktop\request-to-mdx\src\text_to_mdx\experimentation\tests\correctness.py", line 48, in correctness response = eval_model.generate_classification( prompt="Test prompt.", labels=["correct", "not_correct"], include_explanation=False ) File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\llm\wrapper.py", line 288, in generate_classification result = self.generate_object(prompt, schema, **kwargs) File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\tracing.py", line 153, in _wrapper_sync result = func(*args, **kwargs) File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\llm\wrapper.py", line 246, in generate_object return rate_limited_generate(prompt, schema, **kwargs) File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\rate_limiters.py", line 218, in wrapper return fn(*args, **kwargs) File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\llm\adapters\openai\adapter.py", line 163, in generate_object return self._generate_with_tool_calling(prompt, schema, **kwargs) File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\llm\adapters\openai\adapter.py", line 264, in _generate_with_tool_calling raise ValueError("No tool calls in response") ValueError: No tool calls in response The above exception was the direct cause of the following exception: RuntimeError: evaluator failed for example id 'RGF0YXNldEV4YW1wbGU6OQ==', repetition 1

3 comments