Effective Evaluation Options for Data-Driven Teams
Tea I would say there are a couple options we see teams do depending on data that you might have. If you only have example questions and no ground truth answers the following can work well: 1: Chunk Level Evals: Evals on each chunk, are the chunks returned useful to answering the question. Vector search might return something semantically similar but has no hope of answering your question. 2: Q&A Evals: Did you answer the question correctly given the chunks returned. The Eval will help understand if the LLM made up the answer or used only the data in the chunks. You can start here and get a lot of value. If you don’t have questions, I recommend hand creating even just 100, you can do a lot with a little. There are automated question generator options as well but would suggest the hand crafted route as it’s typically pretty easy & fast.
