Re: Bug in ProcArrayApplyRecoveryInfo for snapshots crossing 4B, breaking replicas

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: "Bossart, Nathan" <bossartn(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Bug in ProcArrayApplyRecoveryInfo for snapshots crossing 4B, breaking replicas
Date: 2022-01-26 18:31:00
Message-ID: 90de4d28-f3a3-c71c-c458-a8deef6af410@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/25/22 04:25, Michael Paquier wrote:
> On Mon, Jan 24, 2022 at 10:45:48PM +0100, Tomas Vondra wrote:
>> On 1/24/22 22:28, Bossart, Nathan wrote:
>>>> Attached patch is fixing this by just sorting the XIDs logically. The
>>>> xidComparator is meant for places that can't do logical ordering. But
>>>> these XIDs come from RUNNING_XACTS, so they actually come from the same
>>>> wraparound epoch (so sorting logically seems perfectly fine).
>>>
>>> The patch looks reasonable to me.
>>
>> Thanks!
>
> Could it be possible to add a TAP test? One idea would be to rely on
> pg_resetwal -x and -e close to the 4B limit to set up a node before
> stressing the scenario of this bug, so that would be rather cheap.

I actually tried doing that, but I was not very happy with the result.
The test has to call pg_resetwal, but then it also has to fake pg_xact
data and so on, which seemed a bit ugly so did not include the test in
the patch.

But maybe there's a better way to do this, so here it is. I've kept it
separately, so that it's possible to apply it without the fix, to verify
it actually triggers the issue.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0001-Add-TAP-test.patch text/x-patch 5.1 KB
0002-Fix-ordering-of-XIDs-in-ProcArrayApplyRecoveryInfo.patch text/x-patch 4.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-01-26 18:54:50 Re: autovacuum prioritization
Previous Message Jacob Champion 2022-01-26 18:25:37 Re: Support for NSS as a libpq TLS backend