Database Indexing: B-trees, Composite Indexes & Performance

Every developer knows indexes make queries faster. Far fewer know why, which means they add indexes in the wrong places, miss the ones that matter, and occasionally make things worse. After spending years debugging slow queries in production PostgreSQL, here’s what actually matters.

TL;DR

An index is a sorted copy of a column (or columns) that PostgreSQL can binary-search instead of scanning every row
B-tree is the default and right choice for most cases — use Hash only for exact equality, GiST/GIN for full-text or JSON
Composite index column order matters: put the highest-cardinality, most-filtered column first
Partial indexes (WHERE status = 'pending') are a powerful underused tool — smaller, faster for specific queries
Indexes slow down INSERT/UPDATE/DELETE — don’t index every column on write-heavy tables
EXPLAIN ANALYZE shows exactly what PostgreSQL does; Seq Scan on a large table is your signal to add an index

Prerequisites

Basic SQL familiarity (SELECT, WHERE, JOIN)
PostgreSQL (examples use pg 15+, but concepts apply to MySQL/SQLite too)
Ability to run EXPLAIN ANALYZE on a query

What is an index, actually?

An index is a separate data structure that stores a sorted copy of one or more columns, along with pointers back to the actual rows. When you run SELECT * FROM users WHERE email = 'alice@example.com', without an index PostgreSQL reads every row in the table (a sequential scan) and checks each one. With an index on email, it binary-searches a sorted list and jumps directly to the matching row.

The default index type in PostgreSQL is a B-tree (balanced tree). B-trees are sorted, which means they support not just equality checks (=) but also ranges (>, <, BETWEEN), ordering (ORDER BY), and prefix matches (LIKE 'alice%').

The trade-off: the index is a separate structure that must be updated every time the indexed column changes. Every INSERT, UPDATE, or DELETE on an indexed table writes to the index too.

How do you create a basic index?

sql

-- Single-column index (most common)
CREATE INDEX idx_users_email ON users(email);

-- Unique index (also enforces uniqueness constraint)
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

-- Index on a foreign key (almost always worth it)
CREATE INDEX idx_orders_user_id ON orders(user_id);

Foreign key columns are almost always worth indexing. PostgreSQL doesn’t create them automatically (unlike MySQL), and a JOIN users ON orders.user_id = users.id without an index on orders.user_id does a sequential scan of the orders table for every user.

How do composite indexes work, and does column order matter?

A composite (multi-column) index covers more than one column. The column order determines which queries it can accelerate:

sql

-- Composite index: (status, created_at)
CREATE INDEX idx_orders_status_created ON orders(status, created_at);

This index can serve:

WHERE status = 'pending' ✓
WHERE status = 'pending' AND created_at > '2026-01-01' ✓
WHERE status = 'pending' ORDER BY created_at DESC ✓
WHERE created_at > '2026-01-01' ✗ (can’t use index efficiently — leading column not filtered)

The left-prefix rule: PostgreSQL can use a composite index only when the query filters on a prefix of the index’s columns (starting from the left). Filtering on created_at alone doesn’t hit the index because status — the leftmost column — isn’t constrained.

Column order guidance:

Put the equality-filtered column first, range-filtered column second
Put the higher-selectivity column first (more unique values = more selective = better at narrowing rows)
If you query by status alone frequently AND by (status, created_at), the composite index covers both

What are partial indexes?

A partial index only covers rows that match a WHERE condition. This makes the index smaller (fewer rows), faster to scan, and cheaper to maintain.

sql

-- Only index pending orders — the set you actually query and update
CREATE INDEX idx_orders_pending ON orders(created_at) WHERE status = 'pending';

-- Only index non-deleted users
CREATE INDEX idx_users_active_email ON users(email) WHERE deleted_at IS NULL;

If your application queries almost exclusively on status = 'pending' (e.g., a job queue), a full index on (status, created_at) wastes space indexing millions of completed orders. The partial index covers only the relevant rows.

The query must include the same WHERE condition for PostgreSQL to use the partial index:

sql

-- Uses the partial index ✓
SELECT * FROM orders WHERE status = 'pending' AND created_at > '2026-04-01';

-- Does NOT use the partial index ✗ (no status = 'pending' filter)
SELECT * FROM orders WHERE created_at > '2026-04-01';

When do indexes hurt performance?

Indexes are not free. Each index on a table adds overhead to every write operation:

INSERT: PostgreSQL writes the new row to the table and to every index on that table
UPDATE: If the updated column is indexed, the old index entry is removed and a new one is added
DELETE: The row is removed from the table and all indexes

For read-heavy tables, this overhead is acceptable. For write-heavy tables (event logs, audit trails, time-series data), adding many indexes can make writes noticeably slower.

Other cases where indexes don’t help or hurt:

Low-cardinality columns: An index on a boolean column (is_active) with 90% true rows isn’t selective. PostgreSQL may decide a sequential scan is faster anyway.
Small tables: For tables under a few thousand rows, sequential scans are often faster than index lookups because the entire table fits in memory.
Unused indexes: Every index takes disk space and write overhead. PostgreSQL tracks index usage — check it:

sql

SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY schemaname, tablename;

Zero idx_scan means the index has never been used since the last stats reset. Drop it.

How do you read EXPLAIN ANALYZE output?

EXPLAIN ANALYZE runs the query and shows exactly what PostgreSQL did — which scan type it chose, how many rows it processed, and how long each step took.

sql

EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42 ORDER BY created_at DESC LIMIT 10;

Key things to look for:

Seq Scan on a large table — PostgreSQL read every row. This is the signal that an index is missing or not being used.

plaintext

Seq Scan on orders  (cost=0.00..18340.00 rows=1 width=120) (actual time=45.3..380.2 rows=1 loops=1)
  Filter: (user_id = 42)
  Rows Removed by Filter: 850000

850,000 rows filtered to find 1. Add CREATE INDEX ON orders(user_id).

Index Scan — PostgreSQL used an index. Good for low row counts.

Bitmap Heap Scan — PostgreSQL used an index to find matching row locations, then fetched them in heap order. Common and efficient for moderate result sets.

Nested Loop / Hash Join / Merge Join — join strategies. A Hash Join on a large table using a Seq Scan on the inner table means the inner table needs an index.

The actual time=X..Y shows startup time and total time in milliseconds. Large gaps between cost estimate and actual time mean the planner’s statistics are stale — run ANALYZE tablename to update them.

What about index bloat?

Indexes accumulate dead entries over time as rows are updated and deleted. PostgreSQL’s autovacuum reclaims this space, but on high-write tables it can fall behind. Signs of index bloat:

Index is larger than the table it indexes
Query plans are suboptimal despite indexes existing

Check for bloat:

sql

SELECT pg_size_pretty(pg_relation_size('idx_orders_user_id')) AS index_size,
       pg_size_pretty(pg_relation_size('orders')) AS table_size;

To rebuild a bloated index without locking the table:

sql

REINDEX INDEX CONCURRENTLY idx_orders_user_id;

The CONCURRENTLY option rebuilds the index while allowing reads and writes to continue. Without it, REINDEX takes an exclusive lock.

Summary

B-tree indexes enable binary search instead of sequential scan — the difference between milliseconds and minutes on large tables
Composite indexes follow the left-prefix rule — put equality-filtered, high-cardinality columns first
Partial indexes cover a subset of rows — smaller, faster, cheaper to write for filtered workloads
Indexes slow writes — profile before indexing every column, and drop unused indexes
EXPLAIN ANALYZE is your ground truth — Seq Scan + high Rows Removed by Filter = missing index

FAQ

PostgreSQL didn’t use my index even though I created one. Why?

Several reasons. The table might be small enough that a sequential scan is faster. The column might have low cardinality (e.g., a boolean with mostly one value). Your query might use a function on the indexed column (WHERE lower(email) = ... won’t use an index on email — create a functional index instead: CREATE INDEX ON users(lower(email))). Or the planner’s statistics are stale — run ANALYZE tablename and try again.

Should I always index foreign key columns?

In PostgreSQL, yes in almost every case. PostgreSQL doesn’t automatically create indexes on foreign key columns (MySQL does). Without an index, any JOIN or lookup through the foreign key does a sequential scan of the child table. The main exception: junction tables in many-to-many relationships where you have a composite primary key that already covers the foreign key columns.

How many indexes is too many?

There’s no universal number, but if you have more indexes than columns, that’s worth reviewing. Run the unused indexes query above and drop anything with idx_scan = 0. For write-heavy tables, keep only the indexes that queries actually need. For read-heavy reporting tables, more indexes are acceptable.