Optimize CPU usage of dropping buffers during recovery

From: Jingtang Zhang <mrdrivingduck(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Optimize CPU usage of dropping buffers during recovery
Date: 2026-01-27 17:41:52
Message-ID: 77956427-F0BC-4B2C-9114-81859CDA8E06@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

I happen to find a case where a transaction involves dropping a relation
file, and a relation happens to be empty, the WAL recovery will waste
a lot of time on scanning the whole buffer pool.

In DropRelationBuffers/DropRelationsAllBuffers, we will take fast path
to avoid scanning whole buffer pool, if the buffers to be invalidated
are less than typically 1/32 of buffer pool. However, since no more WAL
records are referencing an empty relation, if we want to drop it, startup
process will never initialize smgr_cached_nblocks for that relation,
causing fallback to slow path.

The situation can be like refreshing a materialized view whose query
returns no row, or truncating a table and then loading it but the query
returns no row. It will generate WAL sequence like:

rmgr: Standby len (rec/tot): 42/ 42, tx: 780, lsn: 0/41238520, prev 0/41238490, desc: LOCK xid 780 db 16384 rel 16416
rmgr: Storage len (rec/tot): 42/ 42, tx: 780, lsn: 0/412385F0, prev 0/412385C0, desc: CREATE base/16384/16416
...
rmgr: Transaction len (rec/tot): 1073/ 1073, tx: 783, lsn: 0/4136BF70, prev 0/41369F60, desc: COMMIT 2026-01-27 08:07:49.953301 UTC; rels: base/16384/16416

This makes me think about initializing the cached nblocks to zero when
redo CREATE record. Note that the 0-size may not correct, because the
relation may have been zeroextended before server crash, and recovery
starts from an earlier checkpoint. But we can correct that by invalidating
the cached value when we really meet a WAL record referencing a block of
that relation, so that a later smgrnblocks() will fill-in the correct
file length. It can avoid extending the relation in the middle of a file
by mistake.

Quickly proposed a patch for it, with a Perl program for judging the
performance difference.

In my environment with 16GB shared buffers, -O2 optimized, debugging off,
I've run a test of 10 client * 500 transactions using such workload, made
the server crash and poll it until recovery is done. The CPU time:

w/o patch: CPU user: 77.58 s, system: 0.27 s
patched: CPU user: 0.14 s, system: 0.09 s

Without the patch, the CPU is wasted on BufTagMatchesRelFileLocator:

Overhead Command Shared Object Symbol
- 98.23% postgres postgres [.] DropRelationsAllBuffers
98.21% DropRelationsAllBuffers
smgrdounlinkall
DropRelationFiles
xact_redo_commit
xact_redo
PerformWalRecovery
StartupXLOG
StartupProcessMain
postmaster_child_launch
StartChildProcess
PostmasterStateMachine
ServerLoop.isra.0
0.06% postgres postgres [.] hash_search_with_hash_value

Not sure if there will be any problem of correctness. Also, not sure about
what to do about INIT fork.

Any thoughts?

--
Regards, Jingtang

Attachment Content-Type Size
0001-Optimize-CPU-usage-of-dropping-buffers-during-recove.patch application/octet-stream 4.2 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2026-01-27 17:42:51 Re: pgsql: Prevent invalidation of newly synced replication slots.
Previous Message Jacob Champion 2026-01-27 17:40:32 Re: Custom oauth validator options