Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)
Date: 2022-09-05 06:35:01
Message-ID: CAA4eK1Lw34MSMUD61tQh4zw=Z0zhGSe42m8uy4DCqDjLYAVpoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 4, 2022 at 11:10 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> On 9/4/22 16:08, Tomas Vondra wrote:
> > ...
> >
> > so in fact we *know* 849 is a subxact of 848, but we don't call
> > ReorderBufferAssignChild in this case. In fact we can't even do the
> > assignment easily in this case, because we create the subxact first, so
> > that the crash happens right when we attempt to create the toplevel one,
> > and we never even get a chance to do the assignment:
> >
> > 1) process the NEW_CID record, logged for 849 (subxact)
> > 2) process CIDs in the WAL record, which has topleve_xid 848
> >
> >
> > So IMHO we need to figure out what to do for WAL records that create
> > both the toplevel and subxact - either we need to skip them, or rethink
> > how we create the ReorderBufferTXN structs.
> >
>
> This fixes the crash for me, by adding a ReorderBufferAssignChild call
> to SnapBuildProcessNewCid, and tweaking ReorderBufferAssignChild to
> ensure we don't try to create the top xact before updating the subxact
> and removing it from the toplevel_by_lsn list.
>
> Essentially, what's happening is this:
>
> 1) We read the NEW_CID record, which is logged with XID 849, i.e. the
> subxact. But we don't know it's a subxact, so we create it as a
> top-level xact with the LSN.
>
> 2) We start processing contents of the NEW_CID, which however has info
> that 849 is subxact of 848, calls ReorderBufferAddNewTupleCids which
> promptly does ReorderBufferTXNByXid() with the top-level XID, which
> creates it with the same LSN, and crashes because of the assert.
>
> I'm not sure what's the right/proper way to fix this ...
>
> The problem is ReorderBufferAssignChild was coded in a way that did not
> expect the subxact to be created first (as a top-level xact).
>

I think there was a previously hard-coded way to detect that and we
have removed it in commit
(https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e3ff789acfb2754cd7b5e87f6f4463fd08e35996).
I think it is possible that subtransaction gets logged without
previous top-level txn record as shown in the commit shared.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-09-05 06:38:04 Re: POC: GROUP BY optimization
Previous Message Etsuro Fujita 2022-09-05 06:27:44 Re: Asynchronous execution support for Custom Scan