- Published on
Writing SQL You'll Thank Yourself For
- Authors

- Name
- Alex Peng
- @aJinTonic
SQL is the most widely used programming language that most developers never really study. People learn enough to get data out and move on. The result is queries that work in development but behave unexpectedly at scale, or are so cryptic that nobody wants to touch them six months later.
Here are the practices that make a real difference.
Be Explicit About What You're Selecting
-- Don't do this
SELECT * FROM orders JOIN users ON orders.user_id = users.id
-- Do this
SELECT
orders.id,
orders.total,
orders.created_at,
users.email,
users.name
FROM orders
JOIN users ON orders.user_id = users.id
SELECT * couples your application to the table schema. When someone adds a column, your query silently returns more data. When you add a second table with a column named id, your SELECT * now returns two ambiguous id columns. Being explicit is self-documenting and prevents surprises.
Use CTEs to Build Up Complexity
Common Table Expressions (CTEs) let you name intermediate results and compose them, rather than nesting subqueries:
-- Hard to read: nested subqueries
SELECT * FROM (
SELECT user_id, COUNT(*) as order_count FROM (
SELECT * FROM orders WHERE status = 'completed'
) completed_orders
GROUP BY user_id
) counts WHERE order_count >= 5
-- Easy to read: CTEs
WITH completed_orders AS (
SELECT user_id, id
FROM orders
WHERE status = 'completed'
),
high_value_users AS (
SELECT user_id, COUNT(*) AS order_count
FROM completed_orders
GROUP BY user_id
HAVING COUNT(*) >= 5
)
SELECT
users.email,
high_value_users.order_count
FROM high_value_users
JOIN users ON users.id = high_value_users.user_id
CTEs read like a story. Each step is named and explicit. You can test each CTE in isolation by running it independently.
Understand NULL Before It Bites You
NULL is not a value, it's the absence of a value. Comparisons with NULL always return NULL, not true or false:
SELECT NULL = NULL -- NULL (not TRUE)
SELECT NULL != NULL -- NULL (not TRUE)
SELECT NULL IS NULL -- TRUE
This trips people up in WHERE clauses:
-- This does NOT find rows where discount is NULL
SELECT * FROM orders WHERE discount != 0
-- This does
SELECT * FROM orders WHERE discount != 0 OR discount IS NULL
And in aggregations:
-- COUNT(*) counts all rows including NULLs
-- COUNT(column) counts only non-NULL values
SELECT
COUNT(*) AS total_orders,
COUNT(discount) AS orders_with_discount
FROM orders
Use COALESCE(value, default) to handle NULLs explicitly:
SELECT
order_id,
COALESCE(discount, 0) AS discount -- treats NULL as 0
FROM orders
Avoid Functions on Indexed Columns in WHERE
Wrapping a column in a function in a WHERE clause often prevents the index from being used:
-- ❌ Can't use index on created_at, function call makes it a computed value
SELECT * FROM orders
WHERE EXTRACT(YEAR FROM created_at) = 2024
-- ✓ Range condition uses the index
SELECT * FROM orders
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01'
The rule: put transformations on the literal side, not the column side.
Use Window Functions Instead of Self-Joins
Window functions let you compute aggregate values across a set of rows while keeping each row:
-- Get each order and its rank within that customer's orders by date
SELECT
order_id,
customer_id,
created_at,
total,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY created_at DESC
) AS order_rank
FROM orders
This returns every order with a order_rank of 1 for each customer's most recent order, 2 for the second most recent, etc. Without window functions, you'd need a self-join or subquery.
Common window functions: ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD(), SUM() OVER (...), AVG() OVER (...).
Format Your Queries
Nobody wants to debug a 200-character single-line query. A formatting convention I use:
SELECT -- keywords capitalized
o.id, -- one column per line
o.total,
u.email
FROM orders o -- aliases for shorter table names
JOIN users u ON u.id = o.user_id
WHERE
o.status = 'completed'
AND o.created_at >= '2024-01-01'
ORDER BY o.created_at DESC
LIMIT 100
Indentation, line breaks, and consistent keyword casing make queries scannable. SQL isn't compiled, it's read by people.
The Mindset Shift
SQL is a declarative language: you describe what you want, not how to get it. The query optimizer figures out the plan. Your job is to write queries that clearly express intent, give the optimizer enough information to use indexes efficiently, and remain readable when you come back to them in three months.
Treat SQL like any other code, it deserves the same care you'd give a function.