Speed up the removal of WAL files

From: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Speed up the removal of WAL files
Date: 2017-11-17 06:35:41
Message-ID: 0A3221C70F24FB45833433255569204D1F81B0C8@G01JPEXMBYT05
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


The attached patch speeds up the removal of WAL files in the old timelines. I'll add this to the next CF.


We need to meet a severe availability requirement of a potential customer. They will use synchronous streaming replication. The allowed failover duration, from the failure through failure detection to the failover completion, is 10 seconds. Even one second is precious.

During a testing on a fast machine with SSD, we observed about 2 seconds between these messages. There were no other messages between them.

LOG: archive recovery complete
LOG: MultiXact member wraparound protections are now enabled


Examining the source code, RemoveNonParentXlogFiles() seems to account for the time. It syncs pg_wal directory every time it deletes a WAL file. max_wal_size was set to 48GB, so about 1,000 WAL files were probably deleted and hence the pg_wal directory was synced as much.


unlink() the WAL files, then sync the pg_wal directory once at the end.

Unfortunately, the original machine is now not available, so I confirmed the speedup on a VM with HDD.

[time to remove 1,000 WAL files including the directory sync]
nonpatched: 2.45 seconds
patched: 0.81 seconds

Takayuki Tsunakawa

Attachment Content-Type Size
speedup_wal_removal.patch application/octet-stream 945 bytes


Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2017-11-17 07:35:43 Re: [HACKERS] Walsender timeouts and large transactions
Previous Message Pavel Stehule 2017-11-17 06:27:56 Re: Add PGDLLIMPORT lines to some variables