Evaluating GPT-4 Answers for Document Chunking Strategies
Hello, I'm working on a Q&A system where a document is split into chunks using multiple text-splitting techniques, and I generate answers using GPT-4 based on these splits. My goal is to evaluate whether the generated answer for each splitter comes from the correct chunk (where the ground truth answer is located). First I divide the document into multiple chunks using different splitting strategies, then fotr each question, I have a chunk (splitter_chunk_ground_truth) that contains the correct answer from the document. And finally i get the GPT-4 generates answers based on each splitting strategy. As part of evaluating the effectiveness of each splitter, I want to check if the answer generated by GPT-4 for each splitter comes from the correct chunk or not. Does anyone know an approach to help solve my problem?
