Hey guys I am new to this community. I am building an LLM Chatbot based on some custom knowledge base. I want to use Phoenix Eval to evaluate the Q&A metric. I have tried it it works well. But, is there a UI based version for running evals ? like MLFlow or LangSmith ?
Barath C. The team has been talking through some ideas relative to this, expect us to eventually roll something out. Quick question - Are you most interested in saving/tracking Eval runs across datasets/dataframes or is the data you are evaluating related to LLM spans with LlamaIndex or LangChain? Maybe both?
