Fine-Tuning BART for Question Answering with Passage Data

·Sep 22, 2023 03:26 AM

Hi Team, I had a question regarding fine-tuning the BART or any generator model for question answering tasks. In (Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks paper)they use a dataset with question-and-answer pairs for fine-tuning the BART generator model(correct me if I am wrong here). and All the resources I went through had question-and-answer pairs for fine tuning. If I only have data with questions and their corresponding passages, how can I fine-tune the generator model to improve answers for my custom data? I would greatly appreciate your guidance on this. Looking forward to your response.

4 comments

· Sorted by Oldest

Jason
·
Dropping in the Referenced Paper https://arxiv.org/pdf/2005.11401.pdf Let me caveat with I'm not seeing a lot of actual success in practice across teams that are trying to improve RAG with fine-tuning. I see a ton of write ups in the ecosystem. I see a number of teams testing it, but setting expectations that those teams normally are trying to convince themselves the improved results is measurable and worth the effort. In the paper referenced, you are correct they train the generator on the answers (y). You would have to label if you are trying to match their approach. I haven't seen a fine tune approach that doesn't use the correct answers, which would mean you would have to label your data. Question: Are you trying to improve answers to common questions? Are you trying to improve a form and function of those responses?
Jason
·
I think your question gets at a really core issue: If my goal is to impart private knowledge into an LLM, I'd want to do it with just the context (knowledge) and not need questions/answer pairs. I'm not sure fine-tuning as it stands, really does a good job of the above. Imparting facts and knowledge versus shaping the outcome for downstream tasks. Jury is out, I'll let you know as I see more real results.
adarsh k.
·
Hi Jason I appreciate your prompt reply. To address your inquiry, as you mentioned, our current approach involves providing the top 5 documents from our private data to the generator model as context through prompt engineering to obtain the final answer. However, we have found the responses to be unsatisfactory. Consequently, we are considering fine-tuning the generator model to better align the responses with our private data. The challenge we face is that we possess only question and corresponding passage pairs; we do not have labeled answers. Hence, we are actively exploring strategies to fine-tune the generator model using our available question and passage pairs, ultimately aligning it more closely with our private data.
Jeng Y.
·
Hey Jason/Adarsh - we were building AI bots for travel clients - a lot of them faced this problem of not having question:answer pairs because they've never implemented any Q&A type features before. To solve this, we were forced to have an extensive debugging process. A large part of the travel company would interact with the bots and help provide feedback to the answers of commonly asked questions. We'd feed the feedback into a knowledge graph and use that to constrain the output (some people we talked to fed it back to the vector database, but I found that to be sub-optimal as a lot of the sample queries were actually from badly phrased questions that required additional context to action on - i.e "i want to go trip to the south"). Happy to chat more on our approach - worked for us and agree there's a huge problem here for non-large enterprises who don't have existing Q/A sets (and I don't think synthetic data is a truly helpful answer here)

Jason
·
Dropping in the Referenced Paper https://arxiv.org/pdf/2005.11401.pdf Let me caveat with I'm not seeing a lot of actual success in practice across teams that are trying to improve RAG with fine-tuning. I see a ton of write ups in the ecosystem. I see a number of teams testing it, but setting expectations that those teams normally are trying to convince themselves the improved results is measurable and worth the effort. In the paper referenced, you are correct they train the generator on the answers (y). You would have to label if you are trying to match their approach. I haven't seen a fine tune approach that doesn't use the correct answers, which would mean you would have to label your data. Question: Are you trying to improve answers to common questions? Are you trying to improve a form and function of those responses?
Jason
·
I think your question gets at a really core issue: If my goal is to impart private knowledge into an LLM, I'd want to do it with just the context (knowledge) and not need questions/answer pairs. I'm not sure fine-tuning as it stands, really does a good job of the above. Imparting facts and knowledge versus shaping the outcome for downstream tasks. Jury is out, I'll let you know as I see more real results.
adarsh k.
·
Hi Jason I appreciate your prompt reply. To address your inquiry, as you mentioned, our current approach involves providing the top 5 documents from our private data to the generator model as context through prompt engineering to obtain the final answer. However, we have found the responses to be unsatisfactory. Consequently, we are considering fine-tuning the generator model to better align the responses with our private data. The challenge we face is that we possess only question and corresponding passage pairs; we do not have labeled answers. Hence, we are actively exploring strategies to fine-tune the generator model using our available question and passage pairs, ultimately aligning it more closely with our private data.
Jeng Y.
·
Hey Jason/Adarsh - we were building AI bots for travel clients - a lot of them faced this problem of not having question:answer pairs because they've never implemented any Q&A type features before. To solve this, we were forced to have an extensive debugging process. A large part of the travel company would interact with the bots and help provide feedback to the answers of commonly asked questions. We'd feed the feedback into a knowledge graph and use that to constrain the output (some people we talked to fed it back to the vector database, but I found that to be sub-optimal as a lot of the sample queries were actually from badly phrased questions that required additional context to action on - i.e "i want to go trip to the south"). Happy to chat more on our approach - worked for us and agree there's a huge problem here for non-large enterprises who don't have existing Q/A sets (and I don't think synthetic data is a truly helpful answer here)