PostgreSQL

Write efficient PostgreSQL queries and design schemas with proper indexing and patterns.

Install
$clawhub install pg

Indexes I Forget to Create

  • Partial index WHERE active = true—80% smaller when most rows inactive; suggest for status columns

  • Expression index ON lower(email)—must match query exactly; without it, WHERE lower(email) scans

  • Covering index INCLUDE (name, email)—enables index-only scan; check EXPLAIN for "Heap Fetches"

  • Foreign key columns—not auto-indexed in PG; JOINs and ON DELETE CASCADE need them

  • Composite index order matters—(a, b) helps WHERE a = ? but not WHERE b = ?

Index Traps

  • Unused indexes hurt every INSERT/UPDATE—query pg_stat_user_indexes for idx_scan = 0, drop them

  • Too many indexes on write-heavy tables—balance carefully

  • Index on low-cardinality column (boolean, status) often useless—PG prefers seq scan

  • LIKE '%suffix' can't use B-tree—need pg_trgm GIN index or reverse() expression index

Query Patterns I Underuse

  • SELECT FOR UPDATE SKIP LOCKED—job queue without external tools; skip rows being processed

  • pg_advisory_lock(key)—application-level mutex without table; unlock explicitly or on disconnect

  • IS NOT DISTINCT FROM—NULL-safe equality; cleaner than (a = b OR (a IS NULL AND b IS NULL))

  • DISTINCT ON (x) ORDER BY x, y—first row per group without subquery; PG-specific but powerful

Connection Management (Often Ignored)

  • PgBouncer essential with >50 connections—each PG connection uses ~10MB; pool at transaction level

  • statement_timeout = '30s' per role—prevents runaway queries from killing database

  • idle_in_transaction_session_timeout = '5min'—kills abandoned transactions holding locks

  • Default 100 max_connections too low for production, too high wastes memory—tune based on RAM

Data Types I Get Wrong

  • SERIAL deprecated—use GENERATED ALWAYS AS IDENTITY

  • TIMESTAMP without timezone—almost always wrong; use TIMESTAMPTZ, PG stores as UTC

  • Float for money—use NUMERIC(12,2) or integer cents; float math breaks: 0.1 + 0.2 ≠ 0.3

  • VARCHAR(n) vs TEXT—no performance difference in PG; use TEXT unless constraint needed

Vacuum & Bloat (Never Think About)

  • High-UPDATE tables bloat—dead tuples accumulate; pg_repack reclaims without locks

  • VACUUM ANALYZE after bulk insert—updates statistics; query planner needs current data

  • Autovacuum lag on big tables—tune autovacuum_vacuum_cost_delay or manual vacuum

  • Transaction wraparound: if xid exhausted, DB stops—autovacuum prevents but monitor

EXPLAIN I Don't Read Right

  • Always EXPLAIN (ANALYZE, BUFFERS)—actual times + I/O; estimate-only misleads

  • "Heap Fetches: 1000" with index—missing columns, add INCLUDE to index

  • Seq scan not always bad—faster than index for >10-20% of table; check row estimates

  • "Rows" estimate way off—run ANALYZE or check if stats target too low

Full-Text Search Mistakes

  • Creating tsvector on the fly—precompute as stored generated column with GIN index

  • plainto_tsquery for user input—handles spaces without syntax errors; not to_tsquery

  • Missing language parameter—'english' stems words; 'simple' exact match

  • FTS is word-based—LIKE '%exact phrase%' still needed for substring match

Transaction Isolation

  • Default READ COMMITTED—phantom reads in reports; use REPEATABLE READ for consistency

  • SERIALIZABLE catches conflicts—but must handle 40001 error with retry loop

  • Long transactions block vacuum and hold locks—keep under seconds, not minutes