Published on
·5 min read

PostgreSQL Process Architecture

Authors

Original Article

Process-per-Connection Model

PostgreSQL spawns a new process for each client connection. When a client connects, the postmaster (main PostgreSQL process) forks a backend process to handle that connection.

This is different from thread-per-connection models used by MySQL. Each PostgreSQL backend has its own memory space, file descriptors, and execution context.

The advantage is isolation. A crash in one backend doesn't affect others. Memory corruption is contained. The disadvantage is resource consumption. Each process costs memory for its stack, heap, and kernel structures.

For nn concurrent connections:

Memorytotaln×(work_mem+stack+shared overhead)\text{Memory}_{\text{total}} \approx n \times (\text{work\_mem} + \text{stack} + \text{shared overhead})

With 1000 connections and 4MB work_mem per connection, backends alone consume 4GB. Connection poolers like PgBouncer mitigate this by multiplexing many clients onto fewer backends.

Shared Memory

While backends have separate memory spaces, they share data through System V shared memory (or mmap on modern systems). The shared buffer pool holds cached table and index pages.

When a backend needs a page, it first checks shared buffers. If present (cache hit), it reads directly. If absent (cache miss), it reads from disk into shared buffers, then accesses the cached copy.

Hit Rate=HitsHits+Misses\text{Hit Rate} = \frac{\text{Hits}}{\text{Hits} + \text{Misses}}

A healthy database maintains hit rates above 99%. PostgreSQL tracks these statistics in pg_stat_database.

Shared buffers are protected by lightweight locks (LWLocks). Multiple backends can read the same buffer concurrently. Writes require exclusive locks. Lock contention becomes visible at high concurrency.

Write-Ahead Logging (WAL)

PostgreSQL guarantees durability through WAL. Before any data modification, the change is logged to the WAL. If the database crashes, WAL replay reconstructs committed transactions.

The WAL is append-only. Writes are sequential, which is efficient on both spinning disks and SSDs. The data files are updated later, when convenient.

Each WAL record contains:

(LSN,transaction_id,operation,data)(\text{LSN}, \text{transaction\_id}, \text{operation}, \text{data})

The Log Sequence Number (LSN) is a monotonically increasing identifier. It's the position in the WAL stream where this record lives.

A transaction commits when its final WAL record hits disk. The commit response to the client can return before data pages are written. This separation is what makes commits fast.

Checkpointer

Without checkpoints, crash recovery would replay the entire WAL history. Checkpoints establish points where all data pages are consistent with the WAL.

The checkpointer periodically:

  1. Writes all dirty buffers to disk
  2. Records the checkpoint LSN
  3. Removes WAL segments older than the checkpoint

After a crash, recovery starts from the last checkpoint, not from the beginning of time. More frequent checkpoints mean faster recovery but more I/O during normal operation.

Recovery TimeWAL Since Last Checkpoint\text{Recovery Time} \propto \text{WAL Since Last Checkpoint}

PostgreSQL spreads checkpoint writes over checkpoint_completion_target fraction of the checkpoint interval. This prevents I/O spikes that could impact query performance.

Background Writer

The background writer proactively flushes dirty buffers to disk. Its goal is to ensure that backends rarely have to wait for buffer eviction.

When a backend needs a buffer and none are free, it must evict a dirty buffer: write to disk, then reuse. This adds latency to the query. The background writer prevents this by maintaining a pool of clean buffers.

The writer uses a clock-sweep algorithm to find eviction candidates. Buffers have usage counts that decay over time. Low-usage buffers are evicted first.

Autovacuum

PostgreSQL uses multi-version concurrency control (MVCC). Updates create new row versions rather than overwriting. Deletes mark rows as dead. Old versions accumulate until vacuumed.

Autovacuum runs in the background:

  • Removes dead row versions, reclaiming space
  • Updates visibility maps for index-only scans
  • Updates planner statistics for query optimization
  • Prevents transaction ID wraparound

Without vacuum, tables bloat and performance degrades. Transaction ID wraparound can cause data loss. Autovacuum is not optional.

The launcher process monitors tables and spawns worker processes for tables needing vacuum. Configuration parameters control how aggressively autovacuum runs.

Statistics Collector

PostgreSQL collects runtime statistics: rows read, blocks hit, sequential scans vs index scans. This data feeds the query planner and helps DBAs understand workload patterns.

Each backend reports statistics to the collector through shared memory or temporary files. The collector aggregates these into system catalog views like pg_stat_user_tables.

The planner uses statistics to estimate selectivity. If a table has 1 million rows and a filter matches 1%, the planner estimates 10,000 rows. Stale statistics lead to poor query plans.

ANALYZE table_name;

This command updates statistics manually. Autovacuum also runs analyze periodically.

Practical Implications

Understanding process architecture helps with:

  • Connection limits: Don't set max_connections to 10,000. Use connection pooling.
  • Memory tuning: shared_buffers should be 25% of RAM on dedicated servers. work_mem multiplied by expected concurrent queries shouldn't exceed available RAM.
  • Monitoring: Watch for checkpoint spikes, vacuum lag, and buffer cache hit rates.
  • Troubleshooting: Each backend is a process. ps aux | grep postgres shows what's running. pg_stat_activity shows what each backend is doing.

The process model is PostgreSQL's oldest design decision. It predates threads being widely available on Unix. Modern PostgreSQL has optimized heavily within this constraint, but the fundamental architecture remains.