Troubleshooting AsyncIO Event Loop Error in PhoenixEvals Implementation

·Apr 30, 2024 07:01 AM

Hi, I implemented PhoenixEvals as follows:

for index, d in enumerate(self.__ragas_dicts):
     if d.get("ground_truth")[0] is not None:
         result = evaluate(
               dataset=Dataset.from_dict(d),
               metrics=self.__metrics)
         results.append(result)

As part of my main script there are also some other calculations ongoing, e.g., calculating a bert_score.

bert_score_dict = metric.compute(predictions=[pred], references=[ref], lang="de")

Depending on the order I sometimes get the error: ERROR:asyncio:Task exception was never retrieved future: <Task finished name='Task-54' coro=<AsyncClient.aclose() done, RuntimeError: Event loop is closed Anyone with an idea how to avoid this issue? What's happening?

10 comments

· Sorted by Oldest

Mikyo
·
Hey Daniel. Looks like you are using Ragas but I can’t seem to grasp how your program is organized. How is this python program executed? Could it be Ragas related? I don’t recognize the error or call stack.

Daniel

The program is organized as follows:

    extractor: Extractor = Extractor()
    metrics_and_loaders: list[tuple[Metric, Loader]] = [(AnswerRelevancyMetric(), AnswerRelevancyLoader())]

    for metric, loader in metrics_and_loaders:
        for evaluation_input_with_id in extractor.extract(extraction_strategy=PhoenixSpansAsEvaluationInput()):
            try:
                loader.load(
                    evaluation_output=EvaluationOutput.from_evaluation_result(
                        evaluation_result=metric.evaluate(evaluation_input_with_id),
                        span_id=evaluation_input_with_id.span_id,
                    )
                )

extractor gets the span_df from phoenix client
metrics contain phoenix eval metrics, ragas metrics and huggingface evaluate metrics such as bert_score
loader writes every metric back to the phoenix client
evaluate is a wrapper around the calculation depending on the chosen framework, e.g. for phoenix:

self.__single_evaluation_dataframe_to_evaluation_result(
            run_evals(
                dataframe=PhoenixEvaluationAdapter.evaluation_input_to_phoenix_input(evaluation_input).to_dataframe(),
                evaluators=[self.get_evaluator(self.__chat_model)],
                provide_explanation=True,
                verbose=True,
                concurrency=1,
            )
        )

Does this make sense?

Mikyo
·
Got it… I haven’t seen this error but maybe Dustin N. might recognize the issue you are having. I’m guessing the stack trace comes from Phoenix evals?
Dustin N.
·
Daniel thanks a lot for the detailed explanation, let me think on this a little bit, I'll get back to you as soon as possible
Dustin N.
·
Daniel am I right in thinking you're running evals on a single entry at a time?
Daniel
·
Dustin N. Yes that's correct because I want the user to decide which metrics he wants to calculate and thus encapsulate it as much as possible (of course at some costs)
Dustin N.
·
normally each call to llm_classify will start and manage it's own event loop, I don't quite understand what's causing your error yet but it seems like a new event loop is closed before the previous one is cleaned up, all of this can be avoided by running llm_classify with the run_sync=True option
1
Daniel
·
That sounds promising! Thanks a lot, will check that out 🙌
Dustin N.
·
great! please let me know if it works out or if you continue to see errors
Daniel
·
Sure will keep you updated. Your help is highly appreciated 🙏
👍1

Mikyo
·
Hey Daniel. Looks like you are using Ragas but I can’t seem to grasp how your program is organized. How is this python program executed? Could it be Ragas related? I don’t recognize the error or call stack.

Daniel

The program is organized as follows:

    extractor: Extractor = Extractor()
    metrics_and_loaders: list[tuple[Metric, Loader]] = [(AnswerRelevancyMetric(), AnswerRelevancyLoader())]

    for metric, loader in metrics_and_loaders:
        for evaluation_input_with_id in extractor.extract(extraction_strategy=PhoenixSpansAsEvaluationInput()):
            try:
                loader.load(
                    evaluation_output=EvaluationOutput.from_evaluation_result(
                        evaluation_result=metric.evaluate(evaluation_input_with_id),
                        span_id=evaluation_input_with_id.span_id,
                    )
                )

extractor gets the span_df from phoenix client
metrics contain phoenix eval metrics, ragas metrics and huggingface evaluate metrics such as bert_score
loader writes every metric back to the phoenix client
evaluate is a wrapper around the calculation depending on the chosen framework, e.g. for phoenix:

self.__single_evaluation_dataframe_to_evaluation_result(
            run_evals(
                dataframe=PhoenixEvaluationAdapter.evaluation_input_to_phoenix_input(evaluation_input).to_dataframe(),
                evaluators=[self.get_evaluator(self.__chat_model)],
                provide_explanation=True,
                verbose=True,
                concurrency=1,
            )
        )

Does this make sense?

Mikyo
·
Got it… I haven’t seen this error but maybe Dustin N. might recognize the issue you are having. I’m guessing the stack trace comes from Phoenix evals?
Dustin N.
·
Daniel thanks a lot for the detailed explanation, let me think on this a little bit, I'll get back to you as soon as possible
Dustin N.
·
Daniel am I right in thinking you're running evals on a single entry at a time?
Daniel
·
Dustin N. Yes that's correct because I want the user to decide which metrics he wants to calculate and thus encapsulate it as much as possible (of course at some costs)
Dustin N.
·
normally each call to llm_classify will start and manage it's own event loop, I don't quite understand what's causing your error yet but it seems like a new event loop is closed before the previous one is cleaned up, all of this can be avoided by running llm_classify with the run_sync=True option
1
Daniel
·
That sounds promising! Thanks a lot, will check that out 🙌
Dustin N.
·
great! please let me know if it works out or if you continue to see errors
Daniel
·
Sure will keep you updated. Your help is highly appreciated 🙏
👍1