Re: ERROR: subtransaction logged without previous top-level txn record

From: Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
To: Dan Katz <dkatz(at)joor(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "Hsu\, John" <hsuchen(at)amazon(dot)com>, "pgsql-bugs\(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: ERROR: subtransaction logged without previous top-level txn record
Date: 2020-01-30 21:22:46
Message-ID: 87ftfwwsex.fsf@ars-thinkpad
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Hi,

Dan Katz <dkatz(at)joor(dot)com> writes:

> Arseny,
>
> I was hoping you could give me some insights about how this bug might
> appear with multiple replications slots. For example if I have two
> replication slots would you expect both slots to see the same error, even
> if they were started, consumed or the LSN was confirmed-flushed at
> different times?

Well, to encounter this you must happen to interrupt decoding session
(e.g. shutdown server) when restart_lsn (LSN since WAL will be read next
time) is at unfortunate position, as described in
https://www.postgresql.org/message-id/87ftjifoql.fsf%40ars-thinkpad

Generally each slot has its own restart_lsn, so if one decoding session
stucked on this issue, another one won't necessarily fail at the same
time. However, restart_lsn can be advanced only to certain points,
mainly xl_running_xacts records, which is logged every 15 seconds. So if
all consumers acknowledge changes fast enough, it is quite likely that
during shutdown restart_lsn will be the same for all slots -- which
means either all of them will stuck on further decoding or all of them
won't. If not, different slots might have different restart_lsn and
probably won't fail at the same time; but encountering this issue even
once suggests that your workload makes possibility of such problematic
restart_lsn perceptible (i.e. many subtransactions). And each
restart_lsn probably has approximately the same chance to be 'bad'
(provided the workload is even).

We need a committer familiar with this code to look here...

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Nick Memos 2020-01-30 22:57:49 Re: BUG #16238: Function " to_char(timestamp, text) " doesn't work properly
Previous Message Dan Katz 2020-01-30 20:09:57 Re: ERROR: subtransaction logged without previous top-level txn record

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2020-01-30 22:13:43 Re: Enabling B-Tree deduplication by default
Previous Message Mark Dilger 2020-01-30 21:15:28 Re: Hash join not finding which collation to use for string hashing