Re: Duplicated LSN in ReorderBuffer

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ildar Musin <ildar(at)adjust(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Duplicated LSN in ReorderBuffer
Date: 2019-08-07 20:19:13
Message-ID: 20190807201913.GA10297@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2019-Jul-26, Andres Freund wrote:

> 2) We could simply assign the subtransaction to the parent using
> ReorderBufferAssignChild() in SnapBuildProcessNewCid() or it's
> caller. That ought to also fix the bug
>
> I also has the advantage that we can save some memory in transactions
> that have some, but fewer than the ASSIGNMENT limit subtransactions,
> because it allows us to avoid having a separate base snapshot for
> them (c.f. ReorderBufferTransferSnapToParent()).

I'm not sure I understood this suggestion correctly. I first tried with
this, which seems the simplest rendition:

--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -772,6 +772,12 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
{
CommandId cid;

+ if ((SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT) &&
+ (xlrec->top_xid != xid))
+ {
+ ReorderBufferAssignChild(builder->reorder, xlrec->top_xid, xid, lsn);
+ }
+
/*
* we only log new_cid's if a catalog tuple was modified, so mark the
* transaction as containing catalog modifications

test_decoding's tests pass with that, but if I try the example script
provided by Ildar, all pgbench clients die with this:

client 19 script 1 aborted in command 1 query 0: ERROR: subtransaction logged without previous top-level txn record

I thought I would create the main txn before calling AssignChild in
snapbuild; however, ReorderBufferTXNByXid is static in reorderbuffer.c.
So that seems out. My next try was to remove the elog() that was
causing the failure ... but that leads pretty quickly to a crash with
this backtrace:

#2 0x00005653241fb823 in ExceptionalCondition (conditionName=conditionName(at)entry=0x5653243c1960 "!(prev_first_lsn < cur_txn->first_lsn)",
errorType=errorType(at)entry=0x565324250596 "FailedAssertion",
fileName=fileName(at)entry=0x5653243c18e8 "/pgsql/source/master/src/backend/replication/logical/reorderbuffer.c",
lineNumber=lineNumber(at)entry=680) at /pgsql/source/master/src/backend/utils/error/assert.c:54
#3 0x0000565324062a84 in AssertTXNLsnOrder (rb=rb(at)entry=0x565326304fa8)
at /pgsql/source/master/src/backend/replication/logical/reorderbuffer.c:680
#4 0x0000565324062e39 in ReorderBufferTXNByXid (rb=rb(at)entry=0x565326304fa8, xid=<optimized out>, xid(at)entry=185613, create=create(at)entry=true,
is_new=is_new(at)entry=0x0, lsn=lsn(at)entry=2645271944, create_as_top=create_as_top(at)entry=true)
at /pgsql/source/master/src/backend/replication/logical/reorderbuffer.c:559
#5 0x0000565324067365 in ReorderBufferAddNewTupleCids (rb=0x565326304fa8, xid=185613, lsn=lsn(at)entry=2645271944, node=..., tid=..., cmin=0,
cmax=4294967295, combocid=4294967295) at /pgsql/source/master/src/backend/replication/logical/reorderbuffer.c:2100
#6 0x0000565324069451 in SnapBuildProcessNewCid (builder=0x56532630afd8, xid=185614, lsn=2645271944, xlrec=0x5653262efc78)
at /pgsql/source/master/src/backend/replication/logical/snapbuild.c:787

Now this failure goes away if I relax the < to <= in the
complained-about line ... but at this point it's two sanity checks that
I've lobotomized in order to get this to run at all. Not really
comfortable with that.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
snapbuild-child.patch text/x-diff 1.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-08-07 20:27:06 Re: Problem with default partition pruning
Previous Message Ibrar Ahmed 2019-08-07 19:57:02 Re: initdb: Use varargs macro for PG_CMD_PRINTF