Re: ERROR: subtransaction logged without previous top-level txn record

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "Hsu, John" <hsuchen(at)amazon(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: ERROR: subtransaction logged without previous top-level txn record
Date: 2020-02-04 06:41:31
Message-ID: CAA4eK1LYzrZ_+8VhD_N_dsQwjxA9t+AyGKT-Wjnc8S7jCwAcBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Mon, Feb 3, 2020 at 7:16 PM Arseny Sher <a(dot)sher(at)postgrespro(dot)ru> wrote:
> Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
>
> > So, doesn't this mean that it started occurring after the fix done in
> > commit 96b5033e11 [1]? Because before that fix we wouldn't have
> > allowed processing XLOG_XACT_ASSIGNMENT records unless we are in
> > SNAPBUILD_FULL_SNAPSHOT state. I am not telling the fix in that
> > commit is wrong, but just trying to understand the situation here.
>
> Nope. Consider again example of WAL above triggering the error:
>
> [ <xl_xact_assignment_1> <restart_lsn> <subxact_change> <xl_xact_assignment_2> <commit> <confirmed_flush_lsn> ]
>
> Decoder starting reading WAL at <restart_lsn> where he immediately reads
> from disk snapshot serialized earlier, which makes it jump to
> SNAPBUILD_CONSISTENT right away.
>

Sure, I understand that if we get serialized snapshot from disk, this
problem can occur and probably we can fix by ignoring serialized
snapshots during slot creation or something on those lines. However,
what I am trying to understand is whether this can occur from another
path as well. I think it might occur without using serialized
snapshots as well because we allow decoding xl_xact_assignment record
even when the snapshot state is not full. Say in your above example,
even if the snapshot state is not SNAPBUILD_CONSISTENT as we haven't
used the serialized snapshot, then also, it can lead to the above
problem due to decoding of xl_xact_assignment. I have not tried to
generate a test case for this, so I could easily be wrong here.

What I am trying to get at is if the problem can only occur by using
serialized snapshots and we fix it by somehow not using them initial
slot creation, then ideally we don't need to remove the error in
ReorderBufferAssignChild, but if that is not the case, then we need to
discuss other cases as well.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Butz 2020-02-04 09:02:45 Re: BUG #16241: Degraded hash join performance
Previous Message PG Bug reporting form 2020-02-04 06:18:01 BUG #16243: non super user take pg_restore found some errors.

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2020-02-04 06:59:46 Re: pg_stat_progress_basebackup - progress reporting for pg_basebackup, in the server side
Previous Message imai.yoshikazu@fujitsu.com 2020-02-04 06:06:39 RE: Complete data erasure