Hi Team, we have multiple multi-agentic systems built using the Agno framework. We want to evaluate them. How can we evaluate these multi-agentic chatbots or systems? Is there any support for this in Arize or Phoenix? For simple RAG, we evaluate based on the question, answer, and context. But how should we evaluate multi-agentic systems? What metrics are relevant here鈥攅specially for Agno agents? Please guide us?
Hi Mikyo, we need to log the evaluation run on custom data traces in the Phoenix Cloud app. Could you please guide us on how to log the traces there?
Hi Mikyo, we need to log the evaluation run on custom data traces in the Phoenix Cloud app. Could you please guide us on how to log the traces there?
How can i use open source models? like llama 3.2, Gemma using GROQ api???
Hey 馃敀[private user] 馃敀[private user] we are using Phoenix as a main eval framework for RAG eval. Now we want to use the Gemini, Claude, and other open-source LLMs for the evaluation. so how can we use those LLMs? By default phoenix uses GPT? We tried as per the documentation but we are unable to find it helpful?? So if you have any notebook or code of how to use other LLMs for eval apart from GPT, please share and help us with this?
Mikyo getting value error: ValueError: signal only works in main thread of the main interpreter
Can we use Open source LLM for Phoenix eval instead of Openai GPT?
Exception in worker on attempt 1: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 2: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 3: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 4: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 5: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 6: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 7: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 8: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 9: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 10: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 11: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker: Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/functions/executor.py", line 127, in consumer
output[index] = generate_task.result()
File "/usr/lib/python3.10/asyncio/futures.py", line 201, in result
raise self._exception.with_traceback(self._exception_tb)
File "/usr/lib/python3.10/asyncio/tasks.py", line 232, in __step
result = coro.send(None)
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/functions/classify.py", line 408, in _arun_eval
label, score, explanation = await payload.evaluator.aevaluate(
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/evaluators.py", line 145, in aevaluate
unparsed_output = await verbose_model._async_generate(
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/models/openai.py", line 257, in _async_generate
response = await self._async_rate_limited_completion(
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/models/openai.py", line 310, in _async_rate_limited_completion
return await _async_completion(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/models/rate_limiters.py", line 227, in wrapper
return await fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/models/openai.py", line 308, in _async_completion
raise e
File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/models/openai.py", line 302, in _async_completion
res = await self._async_client.chat.completions.create(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/openai/resources/chat/completions.py", line 1322, in create
return await self._post(
File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1725, in post
return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1428, in request
return await self._request(
File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1519, in _request
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': "The response was filtered due to the prompt triggering Azure OpenAI's content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", 'type': None, 'param': 'prompt', 'code': 'content_filter', 'status': 400}}
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-17-24dca313976c> in <cell line: 5>()
3 relevance_evaluator = RelevanceEvaluator(eval_model)
4
----> 5 hallucination_eval_df, qa_correctness_eval_df = run_evals(
6 dataframe=queries_df,
7 evaluators=[hallucination_evaluator, qa_correctness_evaluator],
/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/functions/classify.py in run_evals(dataframe, evaluators, provide_explanation, use_function_calling_if_available, verbose, concurrency)
446 defaultdict(dict) for _ in range(len(evaluators))
447 ]
--> 448 for evaluator_index, row_index, label, score, explanation in executor.run(payloads):
449 eval_results[evaluator_index][row_index]["label"] = label
450 eval_results[evaluator_index][row_index]["score"] = score
ValueError: not enough values to unpack (expected 5, got 2)
