Chaninder R. I'll let some other jump in as well, but if you decide to go the LLM route - which I think is totally valid - then I'd recommend:
- 1.
Doing as much preprocessing of your text as you can to create a standardized format. Not sure what your text looks like, but the more noise you can remove from it before you pass it into your model, the better job your model will do
- 2.
We have a small example that shows experimenting with different models for structured text extraction. This experiment starts with a predefined golden dataset, and using that to benchmark how good each model is at the task. Could be a good starting point, even if you just wanted to pull out the Set Up LangChain Task section that's actually defining the extraction tool