Sangram

Evaluating Multi-Agentic Systems Built with Agno Framework

Hi Team, we have multiple multi-agentic systems built using the Agno framework. We want to evaluate them. How can we evaluate these multi-agentic chatbots or systems? Is there any support for this in Arize or Phoenix? For simple RAG, we evaluate based on the question, answer, and context. But how should we evaluate multi-agentic systems? What metrics are relevant here—especially for Agno agents? Please guide us?

1Comment

Posted in Phoenix Support·

Sangram

Logging Evaluation Runs on Custom Data Traces in Phoenix Cloud

Hi Mikyo, we need to log the evaluation run on custom data traces in the Phoenix Cloud app. Could you please guide us on how to log the traces there?

1Comment

Posted in Arize News·

Sangram

Logging Evaluation Runs on Custom Data Traces in Phoenix Cloud

Hi Mikyo, we need to log the evaluation run on custom data traces in the Phoenix Cloud app. Could you please guide us on how to log the traces there?

1Comment

Commented on Using Gemini, Claude, and Other LLMs with Phoenix...·Posted inPhoenix Support

Sangram

How can i use open source models? like llama 3.2, Gemma using GROQ api???

Posted in Phoenix Support·

Sangram

Using Gemini, Claude, and Other LLMs with Phoenix for RAG Eval

Hey 🔒[private user] 🔒[private user] we are using Phoenix as a main eval framework for RAG eval. Now we want to use the Gemini, Claude, and other open-source LLMs for the evaluation. so how can we use those LLMs? By default phoenix uses GPT? We tried as per the documentation but we are unable to find it helpful?? So if you have any notebook or code of how to use other LLMs for eval apart from GPT, please share and help us with this?

3Comments

Posted in Phoenix Support·

Sangram

ValueError: Signal Only Works in Main Thread of Interpreter

Mikyo getting value error: ValueError: signal only works in main thread of the main interpreter

1Comment

Posted in Phoenix Support·

Sangram

Can we use Open source LLM for Phoenix eval instead of Openai GPT?

1Comment

Commented on Intermittent Errors in Phoenix Evaluation: Seeking...·Posted inPhoenix Support

Sangram

Hey Dustin N. & Mikyo we are able to evaluate on hallucination, qa_correctness & Relevance while tracing the logs. we also want to evaluate with toxicity & Q&A eval metrics also? how to do that? how to add log_evaluations for those 2 metrics?

Commented on Intermittent Errors in Phoenix Evaluation: Seeking...·Posted inPhoenix Support

Sangram

Exception in worker on attempt 1: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 2: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 3: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 4: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 5: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 6: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 7: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 8: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 9: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 10: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker on attempt 11: raised BadRequestError('Error code: 400 - {\'error\': {\'message\': "The response was filtered due to the prompt triggering Azure OpenAI\'s content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", \'type\': None, \'param\': \'prompt\', \'code\': \'content_filter\', \'status\': 400}}')
Requeuing...
Exception in worker: Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/functions/executor.py", line 127, in consumer
    output[index] = generate_task.result()
  File "/usr/lib/python3.10/asyncio/futures.py", line 201, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/usr/lib/python3.10/asyncio/tasks.py", line 232, in __step
    result = coro.send(None)
  File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/functions/classify.py", line 408, in _arun_eval
    label, score, explanation = await payload.evaluator.aevaluate(
  File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/evaluators.py", line 145, in aevaluate
    unparsed_output = await verbose_model._async_generate(
  File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/models/openai.py", line 257, in _async_generate
    response = await self._async_rate_limited_completion(
  File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/models/openai.py", line 310, in _async_rate_limited_completion
    return await _async_completion(**kwargs)
  File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/models/rate_limiters.py", line 227, in wrapper
    return await fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/models/openai.py", line 308, in _async_completion
    raise e
  File "/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/models/openai.py", line 302, in _async_completion
    res = await self._async_client.chat.completions.create(**kwargs)
  File "/usr/local/lib/python3.10/dist-packages/openai/resources/chat/completions.py", line 1322, in create
    return await self._post(
  File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1725, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
  File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1428, in request
    return await self._request(
  File "/usr/local/lib/python3.10/dist-packages/openai/_base_client.py", line 1519, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': "The response was filtered due to the prompt triggering Azure OpenAI's content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", 'type': None, 'param': 'prompt', 'code': 'content_filter', 'status': 400}}

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-24dca313976c> in <cell line: 5>()
      3 relevance_evaluator = RelevanceEvaluator(eval_model)
      4 
----> 5 hallucination_eval_df, qa_correctness_eval_df = run_evals(
      6     dataframe=queries_df,
      7     evaluators=[hallucination_evaluator, qa_correctness_evaluator],

/usr/local/lib/python3.10/dist-packages/phoenix/experimental/evals/functions/classify.py in run_evals(dataframe, evaluators, provide_explanation, use_function_calling_if_available, verbose, concurrency)
    446         defaultdict(dict) for _ in range(len(evaluators))
    447     ]
--> 448     for evaluator_index, row_index, label, score, explanation in executor.run(payloads):
    449         eval_results[evaluator_index][row_index]["label"] = label
    450         eval_results[evaluator_index][row_index]["score"] = score

ValueError: not enough values to unpack (expected 5, got 2)

Commented on Intermittent Errors in Phoenix Evaluation: Seeking...·Posted inPhoenix Support

Sangram

Hey, Dustin N. still getting the same error?