Navigating Phoenix Tool for LLM Evaluations: Your Guide
Hello, I'm new in the data science and LLM evaluations, sorry if my question is obvious 馃槉 But could you navigate me through the Phoenix tool capabilities and help to resolve my task? So, I have a Phoenix tool deployed locally. I have a langgraph application. I managed to setup tracing. And I'd like to see the really useful and detailed tracing in the Phoenix app! I also can pull spans from Phoenix and evaluate them using 'run_eval', because there are RETRIEVAL spans in the chain In addition to that, I have a set of questions in CSV file like a dataset for evaluation. My goal is to send them to the app and evaluate using Phoenix evaluators https://docs.arize.com/phoenix/evaluation/how-to-evals/running-pre-tested-evals As far as I understood, I can use 'run_experiment' for this task, import question from CSV, send them to the app, receive answers and evaluate them using llm_evaluators or custom evaluators. But in this case I don't have retrieved_documents in my dataset, because I receive only answer from my application without retrieved_documents. It's limit usage of evaluations because there are no ground_truth in the case. So, Is it possible somehow to push my dataset to the application and have the same set of spans in the chain (including retrieval) and evaluate it using 'run_eval'? Or maybe you can share some ideas how it's better to resolve my case? Thanks in advance!
