Re: ERROR: subtransaction logged without previous top-level txn record

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "Hsu, John" <hsuchen(at)amazon(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: ERROR: subtransaction logged without previous top-level txn record
Date: 2020-02-07 10:59:25
Message-ID: CAA4eK1Jdh0zab=+D91MkFPbevzyjFCZsMwRsQxJh4F9+m_vCRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Wed, Feb 5, 2020 at 2:34 PM Arseny Sher <a(dot)sher(at)postgrespro(dot)ru> wrote:
>
> Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
>
> > On Mon, Feb 3, 2020 at 7:16 PM Arseny Sher <a(dot)sher(at)postgrespro(dot)ru> wrote:
> >> Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> >>
> >> > So, doesn't this mean that it started occurring after the fix done in
> >> > commit 96b5033e11 [1]? Because before that fix we wouldn't have
> >> > allowed processing XLOG_XACT_ASSIGNMENT records unless we are in
> >> > SNAPBUILD_FULL_SNAPSHOT state. I am not telling the fix in that
> >> > commit is wrong, but just trying to understand the situation here.
> >>
> >> Nope. Consider again example of WAL above triggering the error:
> >>
> >> [ <xl_xact_assignment_1> <restart_lsn> <subxact_change> <xl_xact_assignment_2> <commit> <confirmed_flush_lsn> ]
> >>
> >> Decoder starting reading WAL at <restart_lsn> where he immediately reads
> >> from disk snapshot serialized earlier, which makes it jump to
> >> SNAPBUILD_CONSISTENT right away.
> >>
> >
> > Sure, I understand that if we get serialized snapshot from disk, this
> > problem can occur and probably we can fix by ignoring serialized
> > snapshots during slot creation or something on those lines.
>
> There is some confusion. I'll try to reword what we have here:
>
> 1) Decoding from existing slot (*not* initial snapshot construction)
> starts up, immediately picks up snapshot at restart_lsn (getting into
> SNAPBUILD_CONSISTENT) and in some xl_xact_assignment learns that it
> hadn't seen in full (no toplevel records) transaction which it is not
> even going to stream -- but still dies with "subtransation logged
> without...". That's my example above, and that's what people are
> complaining about. Here, usage of serialized snapshot and jump to
> SNAPBUILD_CONSISTENT is not just legit, it is essential: or order to be
> able to stream data since confirmed_flush_lsn, we must pick it up as we
> might not be able to assemble it from scratch in time. I mean, what is
> wrong here is not serialized snapshot usage but the check.
>

I was thinking if we have some way to skip processing of
xl_xact_assignment for such cases, then it might be better. Say,
along with restart_lsn, if have some way to find corresponding nextXid
(below which we don't need to process records). Say, if, during
SnapBuildProcessRunningXacts, we record the xid of txn we got by
ReorderBufferGetOldestTXN in slot, then can't we use it to skip such
records.

> (Lengthy comment to AllocateSnapshotBuilder in my
> 0001-Fix-serialized-snapshot-usage-for-new-logical-slots.patch explains
> why snapbuilder is not able to do FULL -> CONSISTENT transition on its
> own early enough for decoding from existing slot, so the jump on
> snapshot pickup is performed to CONSISTENT directly.)
>
> This happens with or without bac2fae05c.
>
> 2) *New* slot creationg process picks up serialized snapshot and jumps
> to CONSISTENT without waiting for all running xacts to finish. This is
> wrong and is a bug (of very low probability), as we risk promising to
> decode xacts which we might not seen in full. Sometimes it could be
> arrested by "subtransation logged without..." check, but not necessarily
> -- e.g. there could be no subxacts at all.
>
>
> > However,
> > what I am trying to understand is whether this can occur from another
> > path as well. I think it might occur without using serialized
> > snapshots as well because we allow decoding xl_xact_assignment record
> > even when the snapshot state is not full. Say in your above example,
> > even if the snapshot state is not SNAPBUILD_CONSISTENT as we haven't
> > used the serialized snapshot, then also, it can lead to the above
> > problem due to decoding of xl_xact_assignment. I have not tried to
> > generate a test case for this, so I could easily be wrong here.
>
> What you are suggesting here is 3), which is, well, sort of form of 1),
> meaning "subxact logged..." error also pointlessly triggered, but for
> new slot creation. With bac2fae0, decoder might miss first
> xl_xact_assignment because it is beyond start of reading WAL but
> encounter second xl_xact_assignment and die on it due to this check
> before even getting FULL.
>
> But now that I'm thinking about it, I suspect that similar could happen
> even before bac2fae0. Imagine
>
> <start_of_reading_wal> <xl_xact_assignment_1> <SNAPBUILD_FULL> <subxact_change> <xl_xact_assignment_2> <commit> ... <SNAPBUILD_CONSISTENT>
>
> Before bac2fae0, xl_xact_assignment_1 was ignored, so
> xl_xact_assignment_1 would trigger the error.
>

'xl_xact_assignment_1 would trigger the error', I think in this part
of sentence you mean to say xl_xact_assignment_2 because we won't try
to decode xl_xact_assignment_1 before bac2fae0. If so, won't we wait
for such a transaction to finish while changing the snapshot state
from SNAPBUILD_BUILDING_SNAPSHOT to SNAPBUILD_FULL_SNAPSHOT? And if
the transaction is finished, ideally, we should not try to decode its
WAL and or create its ReorderBufferTxn.

> > What I am trying to get at is if the problem can only occur by using
> > serialized snapshots and we fix it by somehow not using them initial
> > slot creation, then ideally we don't need to remove the error in
> > ReorderBufferAssignChild, but if that is not the case, then we need to
> > discuss other cases as well.
>
> So, 1) and 3) mean this is not the case.
>

Right, I am thinking that if we can find some way to skip the xact
assignment for (1) and (3), then that might be a better fix.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2020-02-07 15:04:32 Re: FK violation in partitioned table after truncating a referenced partition
Previous Message Jehan-Guillaume de Rorthais 2020-02-07 10:32:03 Re: FK violation in partitioned table after truncating a referenced partition

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2020-02-07 11:40:33 Re: typedef SegmentNumber
Previous Message Dilip Kumar 2020-02-07 10:48:15 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions