Re: ERROR: subtransaction logged without previous top-level txn record

From: Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "Hsu\, John" <hsuchen(at)amazon(dot)com>, "pgsql-bugs\(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: ERROR: subtransaction logged without previous top-level txn record
Date: 2020-03-04 13:29:44
Message-ID: 87lfog6yev.fsf@ars-thinkpad
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers


Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:

>> Because when we create the slot we don't demand to stream from some
>> specific point. In fact we just can't, because we don't know since which
>> LSN it is actually possible to stream, i.e. when we'd have good snapshot
>> and no old (which we haven't seen in full) xacts running. It is up to
>> snapbuild.c to define this point. The previous coding was meaningless:
>> we asked for some random restart_lsn and snapbuild.c would silently
>> advance it to earliest suitable LSN.
>>
>
> Hmm, if this is the case then it should be true even without solving
> this particular problem and we should be able to make this change.

Right.

> Leaving that aside, I think this change can make copy replication slot
> functionality to also skip using serialized snapshots with this patch
> which is not our intention.

As I say at [1] logical slot copying facility is currently anyway broken
in this regard: restart_lsn is copied, but confirmed_flush isn't, and
the right fix, in my view, is to avoid DecodingContextFindStartpoint
there altogether (by checking donor's confirmed_flush is valid and
copying it) which would render this irrelevant. To speculate, even if
wanted to go through DecodingContextFindStartpoint for slot copying and
establish confirmed_flush on our own, surely we'd need to handle
serialized snapshots exactly as new slot creation does because dangers
of getting SNAPBUILD_CONSISTENT too early are the same in both cases.

> Also, it doesn't seem like a good idea to ignore setting
> start_decoding_at when we already set slot->data.restart_lsn with this
> value.

Well, these two fields have absolutely different values. BTW I find the
naming here somewhat unfortunate, and this phrase suggests that it
indeed leads to confusion.

Slot's restart_lsn is the LSN since which we start reading WAL and by
setting data.restart_lsn we prevent WAL we need from
recycling. start_decoding_at is the LSN since which we start
*streaming*, i.e. actually replaying COMMITs. So setting the first one
(as we must hold WAL) and not the second one (as we don't know the
streaming point yet when we start slot creation) is just fine.

[1] https://www.postgresql.org/message-id/flat/CA%2Bfd4k70BXLTm-N6q18LrL%3DGbKtwY3-2%2B%2BUVFw05SvFTkZgTyQ%40mail.gmail.com

-- cheers, arseny

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Christoph Berg 2020-03-05 10:32:28 Wrong de translation for commands/tablecmds.c:5028
Previous Message Przemysław Szustak 2020-03-03 16:08:18 Re: BUG #16283: crash on create index segmentation fault

Browse pgsql-hackers by date

  From Date Subject
Next Message tushar 2020-03-04 13:51:03 Re: backup manifests
Previous Message Andy Fan 2020-03-04 13:13:54 Re: [PATCH] Erase the distinctClause if the result is unique by definition