Hi, I implemented PhoenixEvals as follows:
for index, d in enumerate(self.__ragas_dicts):
if d.get("ground_truth")[0] is not None:
result = evaluate(
dataset=Dataset.from_dict(d),
metrics=self.__metrics)
results.append(result)As part of my main script there are also some other calculations ongoing, e.g., calculating a bert_score.
bert_score_dict = metric.compute(predictions=[pred], references=[ref], lang="de")Depending on the order I sometimes get the error: ERROR:asyncio:Task exception was never retrieved future: <Task finished name='Task-54' coro=<AsyncClient.aclose() done, RuntimeError: Event loop is closed Anyone with an idea how to avoid this issue? What's happening?
Hey Daniel. Looks like you are using Ragas but I can’t seem to grasp how your program is organized. How is this python program executed? Could it be Ragas related? I don’t recognize the error or call stack.
The program is organized as follows:
extractor: Extractor = Extractor()
metrics_and_loaders: list[tuple[Metric, Loader]] = [(AnswerRelevancyMetric(), AnswerRelevancyLoader())]
for metric, loader in metrics_and_loaders:
for evaluation_input_with_id in extractor.extract(extraction_strategy=PhoenixSpansAsEvaluationInput()):
try:
loader.load(
evaluation_output=EvaluationOutput.from_evaluation_result(
evaluation_result=metric.evaluate(evaluation_input_with_id),
span_id=evaluation_input_with_id.span_id,
)
)extractor gets the span_df from phoenix client
metrics contain phoenix eval metrics, ragas metrics and huggingface evaluate metrics such as bert_score
loader writes every metric back to the phoenix client
evaluate is a wrapper around the calculation depending on the chosen framework, e.g. for phoenix:
self.__single_evaluation_dataframe_to_evaluation_result(
run_evals(
dataframe=PhoenixEvaluationAdapter.evaluation_input_to_phoenix_input(evaluation_input).to_dataframe(),
evaluators=[self.get_evaluator(self.__chat_model)],
provide_explanation=True,
verbose=True,
concurrency=1,
)
)Does this make sense?
normally each call to llm_classify will start and manage it's own event loop, I don't quite understand what's causing your error yet but it seems like a new event loop is closed before the previous one is cleaned up, all of this can be avoided by running llm_classify with the run_sync=True option
That sounds promising! Thanks a lot, will check that out 🙌
great! please let me know if it works out or if you continue to see errors
