Change pgarch_readyXlog() to return .history files first

From: David Steele <david(at)pgmasters(dot)net>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Change pgarch_readyXlog() to return .history files first
Date: 2018-12-13 16:53:53
Message-ID: 929068cf-69e1-bba2-9dc0-e05986aed471@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hackers,

The alphabetical ordering of pgarch_readyXlog() means that on promotion
000000010000000100000001.partial will be archived before 00000002.history.

This appears harmless, but the .history files are what other potential
primaries use to decide what timeline they should pick. The additional
latency of compressing/transferring the much larger partial file means
that archiving of the .history file is delayed and greatly increases the
chance that another primary will promote to the same timeline.

Teach pgarch_readyXlog() to return .history files first (and in order)
to reduce the window where this can happen. This won't prevent all
conflicts, but it is a simple change and should greatly reduce
real-world occurrences.

I also think we should consider back-patching this change. It's hard to
imagine that archive commands would have trouble with this reordering
and the current ordering causes real pain in HA clusters.

Regards,
--
-David
david(at)pgmasters(dot)net

Attachment Content-Type Size
history-files-first-v1.patch text/plain 3.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2018-12-13 17:06:27 Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock
Previous Message Tom Lane 2018-12-13 16:48:33 Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock