Database Indexes Under the Hood

Every engineer eventually reaches the point where a query that worked fine in development starts timing out in production. Nine times out of ten, the answer is an index. But blindly adding indexes is its own problem. To use them well, you need to understand what they actually are.

What Is an Index?

An index is a separate data structure that the database maintains alongside your table. It stores a subset of your columns in a way that makes lookups fast, usually a B-tree (balanced tree).

Think of it like the index at the back of a textbook. Instead of reading every page to find "referential equality", you jump to the index, find the page number, and go directly there.

Without an index, a query like this performs a full table scan, it reads every row:

SELECT * FROM orders WHERE customer_id = 42;

With an index on customer_id, the database walks the B-tree in O(log n) time and fetches only the matching rows.

B-Trees: The Default Index Structure

Most relational databases (PostgreSQL, MySQL, SQLite) default to B-tree indexes. A B-tree keeps data sorted and balanced, so every lookup, insert, and delete is O(log n).

         [50]
        /    \
    [25]      [75]
   /    \    /    \
 [10] [30] [60] [90]

The key insight: the tree stays balanced regardless of how you insert data. This means query time doesn't degrade as your table grows from 10K to 10M rows, it just grows logarithmically.

When B-Trees Shine

B-trees are excellent for:

Equality lookups: WHERE id = 5
Range queries: WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31'
Sorting: ORDER BY last_name (the index is already sorted)
Prefix matching: WHERE name LIKE 'Alex%'

They're poor for suffix or mid-string matching (LIKE '%lex') because the tree is sorted from left to right.

Composite Indexes and Column Order

A composite index covers multiple columns. Column order matters more than most people realize.

CREATE INDEX idx_orders_customer_status ON orders (customer_id, status);

This index is useful for:

-- Uses the index fully
SELECT * FROM orders WHERE customer_id = 42 AND status = 'pending';

-- Uses the index (leading column match)
SELECT * FROM orders WHERE customer_id = 42;

-- Does NOT use the index
SELECT * FROM orders WHERE status = 'pending';

This is the leftmost prefix rule: a composite index can only be used if the query filters on a prefix of the indexed columns in order. Putting a low-cardinality column first (like status which might only have 3–4 values) wastes the index for most queries.

Rule of thumb: put the most selective column first, unless you have a specific query pattern that dictates otherwise.

The Hidden Cost of Indexes

Indexes are not free. Every write operation, INSERT, UPDATE, DELETE, must also update every index on that table.

Write cost = table write + (1 write per index)

A table with 8 indexes pays 9x the write cost of a table with no indexes. For tables that are read-heavy (dashboards, analytics), this is usually fine. For tables that are write-heavy (event logs, audit trails, message queues), over-indexing is a real bottleneck.

Before adding an index, ask:

What query is slow and why?
How often is this table written to vs. read from?
Does an index already exist that covers this query partially?

Using EXPLAIN to Validate

Don't guess, use EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL) to see the query plan:

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42 AND status = 'pending';

Look for these in the output:

Index Scan, the database used your index
Seq Scan, full table scan, usually means no suitable index
Bitmap Heap Scan, used the index to collect row locations, then fetched them in bulk (efficient for large result sets)

A common mistake is adding an index and assuming it's being used. Always verify with EXPLAIN.

Partial Indexes

If you only query a subset of rows frequently, a partial index lets you index just that subset:

-- Only index pending orders, not the entire orders table
CREATE INDEX idx_pending_orders ON orders (created_at)
WHERE status = 'pending';

This is smaller, faster to maintain, and often faster to query than a full index on the same column. It's a great tool when one value of a column dominates your query patterns.

Putting It Together

Good indexing is about understanding your access patterns, not just making queries faster. A few principles I follow:

Index foreign keys, almost always worth it, since joins happen constantly
Verify with EXPLAIN before and after adding indexes
Don't index everything, be selective, especially on write-heavy tables
Review unused indexes, most databases let you see which indexes are never scanned; drop them

Indexes are one of the highest-leverage performance tools available to you, and understanding the mechanics behind them makes the difference between adding an index that helps and adding one that quietly makes everything worse.