Re: ERROR: subtransaction logged without previous top-level txn record

From: Andres Freund <andres(at)anarazel(dot)de>
To: Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
Cc: "Hsu, John" <hsuchen(at)amazon(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: ERROR: subtransaction logged without previous top-level txn record
Date: 2019-10-24 21:31:57
Message-ID: 20191024213157.7pm6niybfxgpvmgg@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Hi,

On 2019-10-24 12:59:30 +0300, Arseny Sher wrote:
> Our customer also encountered this issue and I've looked into it. The problem is
> reproduced well enough using the instructions in the previous message.

Is this with

commit bac2fae05c7737530a6fe8276cd27d210d25c6ac
Author: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Date: 2019-09-13 16:36:28 -0300

logical decoding: process ASSIGNMENT during snapshot build

Most WAL records are ignored in early SnapBuild snapshot build phases.
But it's critical to process some of them, so that later messages have
the correct transaction state after the snapshot is completely built; in
particular, XLOG_XACT_ASSIGNMENT messages are critical in order for
sub-transactions to be correctly assigned to their parent transactions,
or at least one assert misbehaves, as reported by Ildar Musin.

Diagnosed-by: Masahiko Sawada
Author: Masahiko Sawada
Discussion: https://postgr.es/m/CAONYFtOv+Er1p3WAuwUsy1zsCFrSYvpHLhapC_fMD-zNaRWxYg@mail.gmail.com

applied?

> The check leading to this ERROR is too strict, it forbids legit behaviours. Say
> we have in WAL
>
> [ <xl_xact_assignment_1> <restart_lsn> <subxact_change> <xl_xact_assignment_1> <commit> confirmed_flush_lsn> ]
>
> - First xl_xact_assignment record is beyond reading, i.e. earlier
> restart_lsn, where ready snapshot will be taken from disk.
> - After restart_lsn there is some change of a subxact.
> - After that, there is second xl_xact_assignment (for another subxact)
> revealing relationship between top and first subxact, where this ERROR fires.
>
> Such transaction won't be streamed because we hadn't seen it in full. It must be
> finished before streaming will start, i.e. before confirmed_flush_lsn.
>
> Of course, the easiest fix is to just throw the check out.

I don't think that'd actually be a fix, and just hiding a bug.

> Snapbuilder enters into SNAPBUILD_CONSISTENT immediately after deserializing the
> snapshot. Generally this is incorrect because SNAPBUILD_CONSISTENT means not
> just complete snapshot (snapshot by itself in FULL state is just good as in
> CONSISTENT), but also reorderbuffer filled with all currently running
> xacts. This is painless for decoding sessions with existing slots because they
> won't stream anything before confirmed_flush_lsn is reached anyway, at which
> point all transactions which hadn't got into reorderbuffer would definitely
> finish. However, new slots might be created too early, thus losing (not
> decoding) parts of transactions committed after freshly created
> confirmed_flush_lsn. This can happen under the following extremely unlucky
> circumstances:
> - New slot creation reserves point in WAL since which it would read it
> (GetXLogInsertRecPtr);
> - It logs xl_running_xacts to start assembling a snapshot;
> - Running decoding session with another slot quickly reads this
> xl_running_xacts and serializes its snapshot;
> - New slot reads xl_running_xacts and picks this snapshot up, saying that it
> is ready to stream henceforth, though its reorderbuffer is empty.

Yea, that's a problem :(

> Exact reproducing steps:
>
> -- session 1
> create table t (i int);
> select pg_create_logical_replication_slot('slot_1', 'test_decoding');
>
> -- session 2
> begin;
> insert into t values (1);
>
> -- session 3, start slot creation
> select pg_create_logical_replication_slot('slot_2', 'test_decoding');
> -- stop (with gdb or something) it at DecodingContextFindStartpoint(ctx);
>
> -- session 1
> -- xl_running_xacts is dumped by ReplicationSlotReserveWal in previous command, no
> -- need to sleep; our snap will be immediately serialized there
> SELECT data FROM pg_logical_slot_get_changes('slot_1', NULL, NULL, 'include-xids', '1', 'skip-empty-xacts', '0');
>
> -- continue slot_2 creation
>
> -- session 2: insert some more and commit
> insert into t values (1);
> commit;
>
> -- now this would find second insert, but not the first one
> SELECT data FROM pg_logical_slot_get_changes('slot_2', NULL, NULL, 'include-xids', '1', 'skip-empty-xacts', '0');

It'd be good to make this into an isolation test.

> What we can do here?

I think the easiest fix might actually be to just ignore serialized
snapshots when building the initial snapshot.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2019-10-24 21:38:11 Re: BUG #16046: xpath returns CDATA tag along with the value in postgres 12
Previous Message Andres Freund 2019-10-24 21:17:32 Re: BUG #16073: pg_dump build crashes half way

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-10-24 21:48:31 Re: psql tab-complete
Previous Message Victor Spirin 2019-10-24 21:31:11 Re: psql tab-complete