|From:||"Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>|
|Subject:||Speed up the removal of WAL files|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
The attached patch speeds up the removal of WAL files in the old timelines. I'll add this to the next CF.
We need to meet a severe availability requirement of a potential customer. They will use synchronous streaming replication. The allowed failover duration, from the failure through failure detection to the failover completion, is 10 seconds. Even one second is precious.
During a testing on a fast machine with SSD, we observed about 2 seconds between these messages. There were no other messages between them.
LOG: archive recovery complete
LOG: MultiXact member wraparound protections are now enabled
Examining the source code, RemoveNonParentXlogFiles() seems to account for the time. It syncs pg_wal directory every time it deletes a WAL file. max_wal_size was set to 48GB, so about 1,000 WAL files were probably deleted and hence the pg_wal directory was synced as much.
unlink() the WAL files, then sync the pg_wal directory once at the end.
Unfortunately, the original machine is now not available, so I confirmed the speedup on a VM with HDD.
[time to remove 1,000 WAL files including the directory sync]
nonpatched: 2.45 seconds
patched: 0.81 seconds
|Next Message||Kyotaro HORIGUCHI||2017-11-17 07:35:43||Re: [HACKERS] Walsender timeouts and large transactions|
|Previous Message||Pavel Stehule||2017-11-17 06:27:56||Re: Add PGDLLIMPORT lines to some variables|