Slice and Dice SQL with SQL Ninja¶
I write a lot of SQL. It's my primary language for various reasons which we won't get into at here, but even after using it for so long, SQL can still be a pain for some repetitive tasks.
Problem¶
Here's a SQL expression for pivoting some expenses by category:
SELECT
DATEFORMAT(entry_dt, 'yyyy-mm') as year_month,
SUM(CASE WHEN category = 'katanas' THEN amount END) AS katanas,
SUM(CASE WHEN category = 'shurikens' THEN amount END) AS shurikens,
SUM(CASE WHEN category = 'hooks' THEN amount END) AS hooks,
SUM(CASE WHEN category = 'shoes' THEN amount END) AS shoes,
SUM(CASE WHEN category = 'disguises' THEN amount END) AS disguises,
SUM(
CASE WHEN category NOT IN ('katanas','shurikens','hooks','shoes','disguises')
THEN amount END
) AS other_amount,
SUM(amount) AS total_amount
FROM expenses
ORDER BY year_month, total_amount
Can we see the repetitive SQL going on? Here's a hint: SUM(CASE WHEN(...)) AS ...
.
Wouldn't it be nice to make a list of categories and have the SQL generated for you? Wouldn't it be nice to use generated SQL directly from command line?
Solution¶
SQL is a declarative language. HTML is a declarative language. HTML has so many template languages to choose from, but SQL has so few.
Enter SQL Ninja.
SQL Ninja uses the popular Jinja Templating Engine which typically generates HTML, but we can use it to generate SQL.
pip install sql-ninja
Now, let's create a simple SQL file:
cat > hello.sql <<SQL
SELECT 'world'
SQL
From here, we can use the sql
command provided by sql-ninja to transform the
SQL.
sql hello.sql
# => SELECT 'world'
That wasn't very exciting. Let's try to template it:
cat > hello.sql <<SQL
SELECT ''
SQL
And render the template:
sql hello.sql
# => SELECT ''
What happened? We got a blank message. This is because we didn't provide one to the template engine. Try this instead:
sql hello.sql msg='World!!!'
# => SELECT 'World!!!'
We did it!
Starting a New Project¶
In the real world, we work with work with projects that have a lot of other files. We probably don't want our main
projects root directory littered with *.sql
files. SQL Ninja recommends putting templates in the following directory:
.
└── sql
└── templates
└── hello.sql
So let's do that with our hello.sql
mkdir -p sql/templates
mv hello.sql sql/templates/
And from here, we can run our exact sql
command from before:
sql hello.sql msg='World!!!'
# => SELECT 'World!!!'
It still works because SQL Ninja has sensible defaults. Files in current the current working directory and
sql/templates
are the default search paths.
Default Searched Directories
- Current working directory
sql/templates
Please use one or the other, but not both.
Build The Query¶
Create sql/templates/expenses.sql
DATEFORMAT(entry_dt, 'yyyy-mm') as year_month,
SUM(CASE WHEN category = 'SQL' THEN amount END) AS SQL,
SUM(CASE WHEN category = 'Python' THEN amount END) AS Python,
SUM(
CASE WHEN category NOT IN ('SQL','Python')
THEN amount END
) AS other_amount,
SUM(amount) AS total_amount
FROM expenses
ORDER BY year_month, total_amount
Let's run it:
sql expenses.sql categories='food transportation'
#=>
SELECT
DATEFORMAT(entry_dt, 'yyyy-mm') as year_month,
SUM(CASE WHEN category = 'food' THEN amount END) AS food,
SUM(CASE WHEN category = 'transportation' THEN amount END) AS transportation,
SUM(
CASE WHEN category NOT IN ('food','transportation')
THEN amount END
) AS other_amount,
SUM(amount) AS total_amount
FROM expenses
ORDER BY year_month, total_amount
Subquery Workflow¶
Jinja supports including templates in other templates. This is perfect for sub queries!
Let's break up the expenses query into two:
- summaries by category
- pivot the results
-- sql/templates/expenses/summary.sql
SELECT
DATEFORMAT(entry_dt, 'yyyy-mm') as year_month,
category,
SUM(amount) AS amount
FROM expenses
ORDER BY year_month, category
-- sql/templates/expenses/pivot.sqlSELECT
DATEFORMAT(entry_dt, 'yyyy-mm') as year_month,
SUM(CASE WHEN category = 'SQL' THEN amount END) AS SQL,
SUM(CASE WHEN category = 'Python' THEN amount END) AS Python,
SUM(
CASE WHEN category NOT IN ('SQL','Python')
THEN amount END
) AS other_amount,
SUM(amount) AS total_amount
FROM (
{% include 'expenses/summary.sql' %}
) AS summary
ORDER BY year_month, total_amount
Render the new template with: sql expenses/pivot.sql
and override categories
or not:
SELECT
DATEFORMAT(entry_dt, 'yyyy-mm') as year_month,
SUM(CASE WHEN category = 'katanas' THEN amount END) AS katanas,
SUM(CASE WHEN category = 'shurikens' THEN amount END) AS shurikens,
SUM(CASE WHEN category = 'hooks' THEN amount END) AS hooks,
SUM(CASE WHEN category = 'shoes' THEN amount END) AS shoes,
SUM(CASE WHEN category = 'disguises' THEN amount END) AS disguises,
SUM(
CASE WHEN category NOT IN ('katanas','shurikens','hooks','shoes','disguises')
THEN amount END
) AS other_amount,
SUM(amount) AS total_amount
FROM (
SELECT
DATEFORMAT(entry_dt, 'yyyy-mm') as year_month,
category,
SUM(amount) AS amount
FROM expenses
ORDER BY year_month, category
) AS summary
ORDER BY year_month, total_amount
Docker¶
A docker image has been built if installing Python is a problem:
docker pull ddrscott/sql-ninja
Docker containers don't have access to local file system, so we need to mount the volume into the container.
docker run --rm -v $PWD:/app -w /app ddrscott/sql-ninja expenses/pivot.sql
# ^ ^ ^ ^ ^
# | | | | |
# | | | | + the template
# | | | |
# | | | + the image
# | | |
# | | + start in /app path
# | |
# | + volume mount current path to /app
# |
# + remove container when complete
Make an alias if ya want:
# Pick one or name it whatever is most memorable to you:
alias sql='docker run --rm -v $PWD:/app -w /app ddrscott/sql-ninja'
alias sqln='docker run --rm -v $PWD:/app -w /app ddrscott/sql-ninja'
alias sqlninja='docker run --rm -v $PWD:/app -w /app ddrscott/sql-ninja'
Why not Jinja SQL?¶
The jinjasql
project has 452 stars as of this writing and it uses Jinja, too. The project
requires the user to write they SQL as Python strings. If we're going to write SQL in Python strings, then we
don't need a templating engine. Just format strings and use Python. With SQL Ninja we want to write SQL in .sql
files. Period.
https://hashedin.com/blog/introducing-jinjasql-generate-sql-using-jinja-templates/)
Conclusion¶
I'm currently using it in my day job and it has helped parameterize several aspects of some large SQL statements.
Jinja's macro
and include
features really standout and make writing SQL almost dreamy. Almost.
Please let us know how this project is working for you and how to make it better!