Re: ERROR: subtransaction logged without previous top-level txn record

From: Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "Hsu\, John" <hsuchen(at)amazon(dot)com>, "pgsql-bugs\(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: ERROR: subtransaction logged without previous top-level txn record
Date: 2020-02-09 16:07:50
Message-ID: 8736bjoiax.fsf@ars-thinkpad
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers


Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:

>> 1) Decoding from existing slot (*not* initial snapshot construction)
>> starts up, immediately picks up snapshot at restart_lsn (getting into
>> SNAPBUILD_CONSISTENT) and in some xl_xact_assignment learns that it
>> hadn't seen in full (no toplevel records) transaction which it is not
>> even going to stream -- but still dies with "subtransation logged
>> without...". That's my example above, and that's what people are
>> complaining about. Here, usage of serialized snapshot and jump to
>> SNAPBUILD_CONSISTENT is not just legit, it is essential: or order to be
>> able to stream data since confirmed_flush_lsn, we must pick it up as we
>> might not be able to assemble it from scratch in time. I mean, what is
>> wrong here is not serialized snapshot usage but the check.
>>
>
> I was thinking if we have some way to skip processing of
> xl_xact_assignment for such cases, then it might be better. Say,
> along with restart_lsn, if have some way to find corresponding nextXid
> (below which we don't need to process records).

I don't believe you can that without persisting additional
data. Basically, what we need is list of transactions who are running at
the point of snapshot serialization *and* already wrote something before
it -- those we hadn't seen in full and can't decode. We have no such
data currently. The closest thing we have is xl_running_xacts->nextXid,
but

1) issued xid doesn't necessarily means xact actually wrote something,
so we can't just skip xl_xact_assignment for xid < nextXid, it might
still be decoded
2) snapshot might be serialized not at xl_running_xacts anyway

Surely this thing doesn't deserve changing persisted data format.

Somehow I hadn't realized this earlier, so my comments/commit messages
in patches above were not accurate here; I've edited them. Also in the
first patch serialized snapshots are not no longer used for new slot
creation at all, as Andres suggested above. This is not principal, as I
said, but arguably makes things simpler a bit.

I've also found a couple of issues with slot copying feature, will post
in separate thread on them.

Attachment Content-Type Size
0001-Don-t-use-serialized-snapshots-during-logical-slo-v2.patch text/x-diff 10.1 KB
0002-Stop-demanding-that-top-xact-must-be-seen-before--v2.patch text/x-diff 1.8 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2020-02-09 17:48:49 Re: Another FK violation when referencing a multi-level partitioned table
Previous Message Sergei Kornilov 2020-02-09 08:50:01 Re: BUG #16253: Documentation bug https://www.postgresql.org/docs/12/auth-methods.html

Browse pgsql-hackers by date

  From Date Subject
Next Message Arseny Sher 2020-02-09 16:28:59 logical copy_replication_slot issues
Previous Message Pavel Stehule 2020-02-09 12:53:35 Re: [Proposal] Global temporary tables