Re: Failed recovery with new faster 2PC code

From: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
To: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
Subject: Re: Failed recovery with new faster 2PC code
Date: 2017-04-18 08:17:28
Message-ID: CAMGcDxeFVRi2uqhp8OD2nPAX99ChikHbKXcJTAun1mPpCx+Awg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

There was a bug in the redo 2PC remove code path. Because of which, autovac
would think that the 2PC is gone and cause removal of the corresponding
clog entry earlier than needed.

Please find attached, the bug fix: 2pc_redo_remove_bug.patch.

I have been testing this on top of Michael's 2pc-restore-fix.patch and
things seem to be ok for the past one+ hour. Will keep it running for long.

Jeff, thanks for these very useful scripts. I am going to make a habit to
run these scripts on my side from now on. Do you have any other script that
I could try against these patches? Please let me know.

Regards,
Nikhils

On 18 April 2017 at 12:09, Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com> wrote:

>
>
> On 17 April 2017 at 15:02, Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
> wrote:
>
>>
>>
>>> >> commit 728bd991c3c4389fb39c45dcb0fe57e4a1dccd71
>>> >> Author: Simon Riggs <simon(at)2ndQuadrant(dot)com>
>>> >> Date: Tue Apr 4 15:56:56 2017 -0400
>>> >>
>>> >> Speedup 2PC recovery by skipping two phase state files in normal
>>> path
>>> >
>>> > Thanks Jeff for your tests.
>>> >
>>> > So that's now two crash bugs in as many days and lack of clarity about
>>> > how to fix it.
>>> >
>>>
>>
>> The issue seems to be that a prepared transaction is yet to be committed.
> But autovacuum comes in and causes the clog to be truncated beyond this
> prepared transaction ID in one of the runs.
>
> We only add the corresponding pgproc entry for a surviving 2PC transaction
> on completion of recovery. So could be a race condition here. Digging in
> further.
>
> Regards,
> Nikhils
> --
> Nikhil Sontakke http://www.2ndQuadrant.com/
> PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services
>

--
Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
2pc_redo_remove_bug.patch application/octet-stream 403 bytes
2pc-restore-fix.patch application/octet-stream 5.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-04-18 08:25:01 Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...
Previous Message Masahiko Sawada 2017-04-18 08:16:58 Re: some review comments on logical rep code