Re: Reorderbuffer crash during recovery

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reorderbuffer crash during recovery
Date: 2020-01-16 03:47:18
Message-ID: CAFiTN-umcBr=SiQcf8TZq5dPQsiCYBfiWuU4AXX5MHUNnvp0jQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Tue, Dec 31, 2019 at 11:35 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Mon, Dec 30, 2019 at 11:17 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, Dec 27, 2019 at 8:37 PM Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> > >
> > > On 2019-Dec-27, vignesh C wrote:
> > >
> > > > I felt amit solution also solves the problem. Attached patch has the
> > > > fix based on the solution proposed.
> > > > Thoughts?
> > >
> > > This seems a sensible fix to me, though I didn't try to reproduce the
> > > failure.
> > >
> > > > @@ -2472,6 +2457,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
> > > > }
> > > >
> > > > ReorderBufferSerializeChange(rb, txn, fd, change);
> > > > + txn->final_lsn = change->lsn;
> > > > dlist_delete(&change->node);
> > > > ReorderBufferReturnChange(rb, change);
> > >
> > > Should this be done insider ReorderBufferSerializeChange itself, instead
> > > of in its caller?
> > >
> >
> > makes sense. But, I think we should add a comment specifying the
> > reason why it is important to set final_lsn while serializing the
> > change.
>
> Fixed
>
> > > Also, would it be sane to verify that the TXN
> > > doesn't already have a newer final_lsn? Maybe as an Assert.
> > >
> >
> > I don't think this is a good idea because we update the final_lsn with
> > commit_lsn in ReorderBufferCommit after which we can try to serialize
> > the remaining changes. Instead, we should update it only if the
> > change_lsn value is greater than final_lsn.
> >
>
> Fixed.
> Thanks Alvaro & Amit for your suggestions. I have made the changes
> based on your suggestions. Please find the updated patch for the same.
> I have also verified the patch in back branches. Separate patch was
> required for Release-10 branch, patch for the same is attached as
> 0001-Reorder-buffer-crash-while-aborting-old-transactions-REL_10.patch.
> Thoughts?

One minor comment. Otherwise, the patch looks fine to me.
+ /*
+ * We set final_lsn on a transaction when we decode its commit or abort
+ * record, but we never see those records for crashed transactions. To
+ * ensure cleanup of these transactions, set final_lsn to that of their
+ * last change; this causes ReorderBufferRestoreCleanup to do the right
+ * thing. Final_lsn would have been set with commit_lsn earlier when we
+ * decode it commit, no need to update in that case
+ */
+ if (txn->final_lsn < change->lsn)
+ txn->final_lsn = change->lsn;

/decode it commit,/decode its commit,

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Pendekar Dikala Senja 2020-01-16 03:47:55 Re: BUG #16205: background worker "logical replication worker" (PID 25218) was terminated by signal 11: Segmentation
Previous Message Pendekar Dikala Senja 2020-01-16 03:44:34 Re: BUG #16205: background worker "logical replication worker" (PID 25218) was terminated by signal 11: Segmentation

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-01-16 03:51:27 Re: isTempNamespaceInUse() is incorrect with its handling of MyBackendId
Previous Message Michael Paquier 2020-01-16 03:35:28 Re: making the backend's json parser work in frontend code