Re: Duplicated LSN in ReorderBuffer

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ildar Musin <ildar(at)adjust(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Duplicated LSN in ReorderBuffer
Date: 2019-08-07 20:19:13
Message-ID: 20190807201913.GA10297@alvherre.pgsql
Lists: pgsql-hackers

On 2019-Jul-26, Andres Freund wrote:

> 2) We could simply assign the subtransaction to the parent using
> ReorderBufferAssignChild() in SnapBuildProcessNewCid() or it's
> caller. That ought to also fix the bug
> I also has the advantage that we can save some memory in transactions
> that have some, but fewer than the ASSIGNMENT limit subtransactions,
> because it allows us to avoid having a separate base snapshot for
> them (c.f. ReorderBufferTransferSnapToParent()).

I'm not sure I understood this suggestion correctly. I first tried with
this, which seems the simplest rendition:

--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -772,6 +772,12 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
CommandId cid;

+ if ((SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT) &&
+ (xlrec->top_xid != xid))
+ {
+ ReorderBufferAssignChild(builder->reorder, xlrec->top_xid, xid, lsn);
+ }
* we only log new_cid's if a catalog tuple was modified, so mark the
* transaction as containing catalog modifications

test_decoding's tests pass with that, but if I try the example script
provided by Ildar, all pgbench clients die with this:

client 19 script 1 aborted in command 1 query 0: ERROR: subtransaction logged without previous top-level txn record

I thought I would create the main txn before calling AssignChild in
snapbuild; however, ReorderBufferTXNByXid is static in reorderbuffer.c.
So that seems out. My next try was to remove the elog() that was
causing the failure ... but that leads pretty quickly to a crash with
this backtrace:

#2 0x00005653241fb823 in ExceptionalCondition (conditionName=conditionName(at)entry=0x5653243c1960 "!(prev_first_lsn < cur_txn->first_lsn)",
errorType=errorType(at)entry=0x565324250596 "FailedAssertion",
fileName=fileName(at)entry=0x5653243c18e8 "/pgsql/source/master/src/backend/replication/logical/reorderbuffer.c",
lineNumber=lineNumber(at)entry=680) at /pgsql/source/master/src/backend/utils/error/assert.c:54
#3 0x0000565324062a84 in AssertTXNLsnOrder (rb=rb(at)entry=0x565326304fa8)
at /pgsql/source/master/src/backend/replication/logical/reorderbuffer.c:680
#4 0x0000565324062e39 in ReorderBufferTXNByXid (rb=rb(at)entry=0x565326304fa8, xid=<optimized out>, xid(at)entry=185613, create=create(at)entry=true,
is_new=is_new(at)entry=0x0, lsn=lsn(at)entry=2645271944, create_as_top=create_as_top(at)entry=true)
at /pgsql/source/master/src/backend/replication/logical/reorderbuffer.c:559
#5 0x0000565324067365 in ReorderBufferAddNewTupleCids (rb=0x565326304fa8, xid=185613, lsn=lsn(at)entry=2645271944, node=..., tid=..., cmin=0,
cmax=4294967295, combocid=4294967295) at /pgsql/source/master/src/backend/replication/logical/reorderbuffer.c:2100
#6 0x0000565324069451 in SnapBuildProcessNewCid (builder=0x56532630afd8, xid=185614, lsn=2645271944, xlrec=0x5653262efc78)
at /pgsql/source/master/src/backend/replication/logical/snapbuild.c:787

Now this failure goes away if I relax the < to <= in the
complained-about line ... but at this point it's two sanity checks that
I've lobotomized in order to get this to run at all. Not really
comfortable with that.

Álvaro Herrera
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
snapbuild-child.patch text/x-diff 1.6 KB

