Yes, that is what I did to "fix" it. But it is still not 100% reliable. I do not understand why it is necessary to make a tool call in the first place.
I am getting a strange error when using a local open source model (In this case Qwen3-30B) for evals with the new Evals API. It is working normally with OpenAI/Azure models. The model I used here is pretty capable and has no problem to do tool calling in other contexts. Apparently, I get the message that there need to be tool calls in the response - otherwise it does not work. Is this connected to generating formatted output? The error occurs consistently. But, I can reduce the occurences of the error down to approx. 10% if I had something like "Call any tool" in the prompt. As this is not really a gentle solution, I wanted to ask if anyone is aware of this issue. This is my setup, roughly:
model_name = os.getenv("MODEL_NAME_JUDGE")
api_key = os.getenv("OPENAI_KEY_JUDGE", "open")
base_url = os.getenv("OPENAI_BASE_URL")
eval_model = LLM(api_key=api_key, base_url=base_url, model=model_name, provider="openai")
print(eval_model.generate_classification(
prompt="Test prompt.",
labels=["correct", "not_correct"],
include_explanation=False))
File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\experiments\functions.py", line 790, in async_evaluate_run
result = await evaluator.async_evaluate(...)
File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\experiments\evaluators\base.py", line 77, in async_evaluate
return self.evaluate(output=output, **kwargs)
File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\experiments\evaluators\utils.py", line 215, in evaluate
result = func(*bound_signature.args, **bound_signature.kwargs)
File "C:\Users\bizis01\Desktop\request-to-mdx\src\text_to_mdx\experimentation\tests\correctness.py", line 48, in correctness
response = eval_model.generate_classification(
prompt="Test prompt.",
labels=["correct", "not_correct"],
include_explanation=False
)
File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\llm\wrapper.py", line 288, in generate_classification
result = self.generate_object(prompt, schema, **kwargs)
File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\tracing.py", line 153, in _wrapper_sync
result = func(*args, **kwargs)
File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\llm\wrapper.py", line 246, in generate_object
return rate_limited_generate(prompt, schema, **kwargs)
File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\rate_limiters.py", line 218, in wrapper
return fn(*args, **kwargs)
File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\llm\adapters\openai\adapter.py", line 163, in generate_object
return self._generate_with_tool_calling(prompt, schema, **kwargs)
File "C:\Users\bizis01\Desktop\request-to-mdx\.venv\Lib\site-packages\phoenix\evals\llm\adapters\openai\adapter.py", line 264, in _generate_with_tool_calling
raise ValueError("No tool calls in response")
ValueError: No tool calls in response
The above exception was the direct cause of the following exception:
RuntimeError: evaluator failed for example id 'RGF0YXNldEV4YW1wbGU6OQ==', repetition 1I’m running into a 404 error when using the new evals library — connection to the endpoint fails:
eval_model = LLM(
model=model_name,
provider="azure",
client="openai",
api_version=api_version,
api_key=api_key,
base_url=azure_base_url
)
eval_model.generate_classification(...)I’m using the exact same parameters that work with OpenAIModel in the legacy evals library. Could this be related to using chat-completions vs. responses?
RunLLM the issue and PR you shared is only talking about the reasoning models (o1, o3, o4). I have installed the latest version and there is no support for gpt-5 models.
Hi everyone I’m running an eval_model with llm_classify using the Azure OpenAI API, and I’m getting this error when using gpt-5-mini:
Unsupported parameter: 'max_tokens' is not supported with this model.
Use 'max_completion_tokens' instead.Digging deeper, I found this method in the codebase (OpenAIModel):
def _get_token_param_str(is_azure: bool, model: str) -> str:
"""
Get the token parameter string for the given model.
OpenAI o1 and o3 models made a switch to use
max_completion_tokens and now all the models support it.
However, Azure OpenAI models currently do not support
max_completion_tokens unless it's an o1 or o3 model.
"""
azure_reasoning_models = ("o1", "o3", "o4")
if is_azure and not model.startswith(azure_reasoning_models):
return "max_tokens"
return "max_completion_tokens"As you can see, it currently filters only for the reasoning models (o1, o3, o4) to use max_completion_tokens. However, I couldn’t find anything in the Azure docs confirming this — but it seems like the Azure endpoint might now require max_completion_tokens for the GPT-5 family as well. Question: Has anyone else experienced this or found updated documentation confirming that Azure GPT-5 models now require max_completion_tokens instead of max_tokens?
