Re: Reorderbuffer crash during recovery

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reorderbuffer crash during recovery
Date: 2019-12-17 09:02:10
Message-ID: CAA4eK1LrOzJj8K+i6YzyzmQJqM2_M3xvvGsCcCZXAMgCzyLY_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Wed, Dec 11, 2019 at 11:13 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
>
> It sets the final_lsn here so that it can iterate from the start_lsn
> to final_lsn and cleanup the serialized files in
> ReorderBufferRestoreCleanup function. One solution We were thinking
> was to store the lsn of the last serialized change while serialiizing
> and set the final_lsn in the above case where it crashes like the
> below code:

Sure, we can do something on the lines what you are suggesting, but
why can't we update final_lsn at the time of serializing the changes?
If we do that then we don't even need to compute it separately during
ReorderBufferAbortOld.

Let me try to explain the problem and proposed solutions for the same.
Currently, after serializing the changes we remove the 'changes' from
ReorderBufferTXN. Now, if the system crashes due to any reason after
that, we won't be able to compute final_lsn after the restart. And
that leads to access violation in ReorderBufferAbortOld which is
trying to access changes list from ReorderBufferTXN to compute
final_lsn.

We could fix it by either tracking 'last_serialized_change' as
proposed by Vignesh or we could update the final_lsn while we
serialize the changes.

IIUC, commit df9f682c7bf81674b6ae3900fd0146f35df0ae2e [1] tried to fix
some related issue which leads to this another problem. Alvaro,
Andres, do you have any suggestions?

[1] -
commit df9f682c7bf81674b6ae3900fd0146f35df0ae2e
Author: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Date: Fri Jan 5 12:17:10 2018 -0300

Fix failure to delete spill files of aborted transactions

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2019-12-17 09:51:38 BUG #16169: Default time output for 24:00 is 00:00
Previous Message Pavel Stehule 2019-12-17 06:46:46 Re: Planning time is high in Postgres 11.5 Compared with Postgres 10.11

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-12-17 09:03:42 unsupportable composite type partition keys
Previous Message Arthur Zakirov 2019-12-17 08:10:28 Re: pg_upgrade fails with non-standard ACL