Getting errors while evaluating using Phoenix, but a few hours ago the same file ran really well without any error!
Hey Sangram we can't actually help much if the errors etc. are cut-off. Can you paste the stack traces for us? This will help us a lot as engineers because we don't really know where things are going wrong just from images of the failed line. From what I can tell it looks like the dataframe you are passing in doesn't have sufficient columns but I can't quite tell. As our engineers Dustin N. and Xander S. come online they will need your help getting the details to reproduce this. Would you mind filling out a bug report with the details? This will help a lot! https://github.com/Arize-ai/phoenix/issues/new/choose
Exception in worker: Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/functions/executor.py", line 127, in consumer
output[index] = generate_task.result()
File "/usr/lib/python3.10/asyncio/futures.py", line 201, in result
raise self._exception.with_traceback(self._exception_tb)
File "/usr/lib/python3.10/asyncio/tasks.py", line 232, in __step
result = coro.send(None)
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/functions/classify.py", line 408, in _arun_eval
label, score, explanation = await payload.evaluator.aevaluate(
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/evaluators.py", line 145, in aevaluate
unparsed_output = await verbose_model._async_generate(
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/models/openai.py", line 255, in _async_generate
response = await self._async_rate_limited_completion(
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/models/openai.py", line 308, in _async_rate_limited_completion
return await _async_completion(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/models/rate_limiters.py", line 245, in wrapper
raise RateLimitError(f"Exceeded max ({self._max_rate_limit_retries}) retries")
phoenix.experimental.evals.models.rate_limiters.RateLimitError: Exceeded max (10) retries
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-14-16de7d2f67d0> in <cell line: 5>()
3 relevance_evaluator = RelevanceEvaluator(eval_model)
4
----> 5 hallucination_eval_df, qa_correctness_eval_df = run_evals(
6 dataframe=queries_df,
7 evaluators=[hallucination_evaluator, qa_correctness_evaluator],
/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/functions/classify.py in run_evals(dataframe, evaluators, provide_explanation, use_function_calling_if_available, verbose, concurrency)
446 defaultdict(dict) for _ in range(len(evaluators))
447 ]
--> 448 for evaluator_index, row_index, label, score, explanation in executor.run(payloads):
449 eval_results[evaluator_index][row_index]["label"] = label
450 eval_results[evaluator_index][row_index]["score"] = scoreValueError: not enough values to unpack (expected 5, got 2)
Sangram You can track the status of this bug here: https://github.com/Arize-ai/phoenix/issues/2216 For now, the underlying issue why the worker was failing was because you've hit rate limit errors. Try lowering your concurrency or configuring the internal rate limiter in your OpenAIModel. Here's some code you can try adding to your script:
from phoenix.experimental.evals.models.rate_limiters import RateLimiter
eval_model._rate_limiter = RateLimiter(
rate_limit_error=eval_model._openai.RateLimitError,
max_rate_limit_retries=10,
initial_per_second_request_rate=1,
maximum_per_second_request_rate=2,
enforcement_window_minutes=1,
rate_reduction_factor: float = 0.7,
)Exception in worker on attempt 1: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 2: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 3: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 4: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 5: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 6: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 7: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 8: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 9: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 10: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 11: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker: Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/functions/executor.py", line 127, in consumer
output[index] = generate_task.result()
File "/usr/lib/python3.10/asyncio/futures.py", line 201, in result
raise self._exception.with_traceback(self._exception_tb)
File "/usr/lib/python3.10/asyncio/tasks.py", line 232, in __step
result = coro.send(None)
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/functions/classify.py", line 408, in _arun_eval
label, score, explanation = await payload.evaluator.aevaluate(
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/evaluators.py", line 145, in aevaluate
unparsed_output = await verbose_model._async_generate(
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/models/openai.py", line 257, in _async_generate
response = await self._async_rate_limited_completion(
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/models/openai.py", line 310, in _async_rate_limited_completion
return await _async_completion(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/models/rate_limiters.py", line 227, in wrapper
return await fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/models/openai.py", line 308, in _async_completion
raise e
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/models/openai.py", line 302, in _async_completion
res = await self._async_client.chat.completions.create(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/openai/resources/chat/completions.py", line 1322, in create
return await self._post(
File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1725, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1428, in request
return await self._request(
File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1519, in _request
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': "The response was filtered due to the prompt triggering Azure OpenAI's content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", 'type': None, 'param': 'prompt', 'code': 'content_filter', 'status': 400}}
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-17-24dca313976c> in <cell line: 5>()
3 relevance_evaluator = RelevanceEvaluator(eval_model)
4
----> 5 hallucination_eval_df, qa_correctness_eval_df = run_evals(
6 dataframe=queries_df,
7 evaluators=[hallucination_evaluator, qa_correctness_evaluator],
/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/functions/classify.py in run_evals(dataframe, evaluators, provide_explanation, use_function_calling_if_available, verbose, concurrency)
446 defaultdict(dict) for _ in range(len(evaluators))
447 ]
--> 448 for evaluator_index, row_index, label, score, explanation in executor.run(payloads):
449 eval_results[evaluator_index][row_index]["label"] = label
450 eval_results[evaluator_index][row_index]["score"] = score
ValueError: not enough values to unpack (expected 5, got 2)
Hi Sangram, the last error is our fallback logic and it isn't fixed yet, however your underlying issue is different. Before you were having a rate limit error, and now your response is being blocked by their model content management policy, you'll probably have to change your input so that it's not rejected by the model, unfortunately
