Understanding QA Correctness vs. Hallucination Evaluation
All - just to make sure I understand the difference between QA Correctness and Hallucination evaluation:
QA Correctness: does something like is_correct(question, answer, docs) that is, it
takes all the docs as a single string, and says "You must determine whether the given answer correctly answers the question based on the reference text."
Hallucination: is_hallucinated(question, answer, docs)
takes all the docs as a single string, and says "you must use the reference text to determine if the answer to the question contains false information, if the answer is a hallucination of facts. ... A 'hallucination' in this context refers to an answer that is not based on the reference text or assumes information that is not available in the reference text"
Question: How are the diff/same? Specifically, how do I interpret these four different buckets?
Hallucination = Yes hallucinated, QACorrect = Yes correct
Hallucination = Yes hallucinated, QACorrect = Not correct
Hallucination = Not hallucinated, QACorrect = Yes correct
Hallucination = Not hallucinated, QACorrect = Not correct
