

And finally, if a data analyst hits an error, they don’t just give up - they learn from the error and write a new query.
#Sql script summarize tool how to#
This can yield some hints at how to get around the context window limitations. Data analysts also don’t usually just look at all the data (or thousands of rows) at the same time - they may limit any exploratory queries to the top K rows, or look at summary stats instead. This can be thought of as the data analyst learning what the data looks like so that when they write a SQL query in the future it is grounded in what actually exists. They can look at the schema of the tables, or even certain rows. For example, they usually make a sample query beforehand to understand what the data looks like. When data analysts query SQL databases, there’s a few things they normally do that help them make the right queries.

So let’s think about what a data analyst would do if they were asked to answer a BI question. If we can then replicate the steps that we would take to solve those problems, we can help the LLM do so as well. When thinking about how to tackle these issues, it’s informative to think about how we as humans tackle these issues. What do we do then? Do we give up? The (High Level) Solutions The SQL it writes may be incorrect for whatever reason, or it could be correct but just return an unexpected result. So if we were to naively pass in all the data to ground the LLM in reality, we would likely run into this issue.Ī third issue is a more basic one: sometimes the LLM just messes up. This is relevant because SQL databases often contain a lot of information. LLMs have some context window which limits the amount of text they can operate over. However, this runs into a second issue - the context window length. The main idea to fix this (we will go into more detail below) is to provide the LLM with knowledge about what actually exists in the database and tell it to write a SQL query consistent with that. So one of the big challenges we face is how to ground the LLM in reality so that it produces valid SQL. LLMs can write SQL, but they are often prone to making up tables, making up fields, and generally just writing SQL that if executed against your database would not actually be valid. The main issue that exists is hallucination. So LLMs can write SQL - what more is needed? However, there are several issues that make this a non-trivial task. LLMs have an understanding of SQL and are able to write it pretty well. But what if you could just interact with a SQL database in natural language? With LLMs today, that is possible. With the amount of valuable data stored there, business intelligence (BI) tools that make it easy to query and understand the data present there have risen in popularity. Most of an enterprise’s data is traditionally stored in SQL databases. The LangChain library has multiple SQL chains and even an SQL agent aimed at making interacting with data stored in SQL as easy as possible. This webinar will be on March 22nd - sign up at the below link: We’re even more excited to announce that we’ll be doing an hour long webinar with them to discuss these learnings and field other related questions. We’re really excited to write this blog post with them going over all the tips and tricks they’ve learned doing so. 8 min read Photo by Kaleidico / Unsplashįrancisco Ingham and Jon Luo are two of the community members leading the change on the SQL integrations.
