Why are LLMs bad at SQL?¶
In response to a LinkedIn comment by Sasha
Did I hear someone say SQL and LLMs?!
TL;DR - Publicly available LLMs are inherently bad at SQL. Furthermore, they have little hope of ever getting good at it.
Why¶
I'm not surprised by the low scores. I'd imagine since the dawn of the language and the Internet, the amount of high-quality SQL examples with high-quality explanations is zero. Compare that with code in other languages it must be ∞ to 0. I'm surprised LLMs can even spell SQL :). Even with my own musing on SQL, I round down the quality examples to zero.
When developers are first getting started, they're not looking for tutorials on SQL. They're looking to more popular languages to make Snake games, Flappy Birds, and sort binary trees to cheat on their homework. It's a fools errand to write about SQL. SQL has a public audience of none. From business is born SQL, and business likes to keep their expensive SQL a secret. From business is bred the deepest knowledge of SQL and it's rarely tied to a reasonable explanation. It'll never be a business priority to properly teach complex SQL.
SQL is my Love Language¶
SQL is a language that transcends understanding by human intelligence and artificial intelligence. It's tied so close to the data it's trying to manage it's almost inseparable. In the same way DNA describes an animal - SQL describes our data. The permutations of SQL sequences is infinitely complex and means nothing on it's own. It can only be understood with its data.
Yet, there is hope for SQL and LLMs to live in harmony. It'll be in the private sector, when businesses invest in creating good training data - pairs of questions and answers tailored to their private datasets.
The years I've invested in the SQL language has made me oddly sentimental. I'll close by borrowing from Calum Scott:
I'd climb every mountain
And swim every ocean
Just to write SQL
And fix what I've broken
'Cause I need data please
That you are the reason