Re: ERROR: subtransaction logged without previous top-level txn record

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "Hsu, John" <hsuchen(at)amazon(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: ERROR: subtransaction logged without previous top-level txn record
Date: 2020-02-03 06:45:21
Message-ID: CAA4eK1L=MDbmGu5-+BmY7Svc07jr+ZabiH8C_qo3RSc8pgUpDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Fri, Oct 25, 2019 at 12:26 PM Arseny Sher <a(dot)sher(at)postgrespro(dot)ru> wrote:
>
>
> Andres Freund <andres(at)anarazel(dot)de> writes:
>
> > Hi,
> >
> > On 2019-10-24 12:59:30 +0300, Arseny Sher wrote:
> >> Our customer also encountered this issue and I've looked into it. The problem is
> >> reproduced well enough using the instructions in the previous message.
> >
> > Is this with
> >
> > commit bac2fae05c7737530a6fe8276cd27d210d25c6ac
> > Author: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
> > Date: 2019-09-13 16:36:28 -0300
> >
> > logical decoding: process ASSIGNMENT during snapshot build
> >
> > Most WAL records are ignored in early SnapBuild snapshot build phases.
> > But it's critical to process some of them, so that later messages have
> > the correct transaction state after the snapshot is completely built; in
> > particular, XLOG_XACT_ASSIGNMENT messages are critical in order for
> > sub-transactions to be correctly assigned to their parent transactions,
> > or at least one assert misbehaves, as reported by Ildar Musin.
> >
> > Diagnosed-by: Masahiko Sawada
> > Author: Masahiko Sawada
> > Discussion: https://postgr.es/m/CAONYFtOv+Er1p3WAuwUsy1zsCFrSYvpHLhapC_fMD-zNaRWxYg@mail.gmail.com
> >
> > applied?
>
> Yeah, I tried fresh master. See below: skipped xl_xact_assignment is
> beyond restart_lsn at all (and thus not read), so this doesn't matter.
>
>
> >> The check leading to this ERROR is too strict, it forbids legit behaviours. Say
> >> we have in WAL
> >>
> >> [ <xl_xact_assignment_1> <restart_lsn> <subxact_change> <xl_xact_assignment_1> <commit> confirmed_flush_lsn> ]
> >>
> >> - First xl_xact_assignment record is beyond reading, i.e. earlier
> >> restart_lsn, where ready snapshot will be taken from disk.
> >> - After restart_lsn there is some change of a subxact.
> >> - After that, there is second xl_xact_assignment (for another subxact)
> >> revealing relationship between top and first subxact, where this ERROR fires.
> >>
> >> Such transaction won't be streamed because we hadn't seen it in full. It must be
> >> finished before streaming will start, i.e. before confirmed_flush_lsn.
> >>
> >> Of course, the easiest fix is to just throw the check out.
> >
> > I don't think that'd actually be a fix, and just hiding a bug.
>
> I don't see a bug here. At least in reproduced scenario I see false
> alert, as explained above: transaction with skipped xl_xact_assignment
> won't be streamed as it finishes before confirmed_flush_lsn.
>

Does this guarantee come from the fact that we need to wait for such a
transaction before reaching a consistent snapshot state? If not, can
you explain a bit more what makes you say so?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2020-02-03 09:15:46 Re: BUG #16223: Performance regression between 11.6 and 12.1 in an SQL query with a recursive CTE based on function
Previous Message Andres Freund 2020-02-03 06:23:14 Re: postgres crash on concurrent update of inheritance partitioned table

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-02-03 06:55:47 Re: Proposal: Add more compile-time asserts to expose inconsistencies.
Previous Message Amit Langote 2020-02-03 05:26:55 Re: table partitioning and access privileges