Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From: "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "michael(at)paquier(dot)xyz" <michael(at)paquier(dot)xyz>
Cc: "alvherre(at)alvh(dot)no-ip(dot)org" <alvherre(at)alvh(dot)no-ip(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUG] Panic due to incorrect missingContrecPtr after promotion
Date: 2022-06-24 16:17:34
Message-ID: 7AF889CF-315C-4FB7-88C0-734FE3F01DE9@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Thus, I still don't see what have happened at Imseih's hand, but I can
> cause PANIC with a bit tricky steps, which I don't think valid. This
> is what I wanted to know the exact steps to cause the PANIC.

> The attached 1 is the PoC of the TAP test (it uses system()..), and
> the second is a tentative fix for that. (I don't like the fix, too,
> though...)

It is been difficult to get a generic repro, but the way we reproduce
Is through our test suite. To give more details, we are running tests
In which we constantly failover and promote standbys. The issue
surfaces after we have gone through a few promotions which occur
every few hours or so ( not really important but to give context ).

I am adding some additional debugging to see if I can draw a better
picture of what is happening. Will also give aborted_contrec_reset_3.patch
a go, although I suspect it will not handle the specific case we are deaing with.

Regards,

Sami imseih
Amazon Web Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-06-24 17:12:02 Re: WIP Patch: Add a function that returns binary JSONB as a bytea
Previous Message Aleksander Alekseev 2022-06-24 15:04:26 Re: Make COPY extendable in order to support Parquet and other formats