Really an awesome discussion with Manas from community. He used the defog library as ground truth to evaluate LLM as a judge for Generated SQL
https://github.com/defog-ai/sql-eval
Solid results, lots of debate on if we should add the schema to the prompt to improve, definitely an area of investment in research.