Re: Duplicated LSN in ReorderBuffer

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Ildar Musin <ildar(at)adjust(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Duplicated LSN in ReorderBuffer
Date: 2019-07-09 10:04:08
Message-ID: CAD21AoBuMEhhoPyxEcnhbyhVaAKv1vfb3906My_qknAkjKoaiQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 8, 2019 at 11:46 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Jul 8, 2019 at 9:00 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> >
> > On Wed, Jun 26, 2019 at 2:46 AM Ildar Musin <ildar(at)adjust(dot)com> wrote:
> > > Attached is a simple patch that uses subxid instead of top-level xid
> > > in ReorderBufferAddNewTupleCids() call. It seems to fix the bug, but
> > > i'm not sure that this is a valid change. Can someone please verify it
> > > or maybe suggest a better solution for the issue?
> >
>
> I've reproduced this issue with script Ildar provided. I don't find
> out the root cause yet and I'm not sure the patch takes a correct way
> to fix this.
>
> In my environment, I got the following pg_waldump output and the
> logical decoding failed at 0/019FA058 when processing NEW_CID.
>
> 90489 rmgr: Transaction len (rec/tot): 38/ 38, tx:
> 1999, lsn: 0/019F9E80, prev 0/019F9E38, desc: ASSIGNMENT xtop 1998:
> subxacts: 1999
> 90490 rmgr: Standby len (rec/tot): 405/ 405, tx:
> 0, lsn: 0/019F9EA8, prev 0/019F9E80, desc: RUNNING_XACTS nextXid 2000
> latestCompletedXid 1949 oldestRunningXid 1836; 48 xacts: 1990 1954
> 1978 1850 1944 1972 1940 1924 1906 1970 1985 1998 1966 1987 1975 1858
> 1914 1982 1958 1840 1920 1926 1992 1962 1
> 90490 910 1950 1874 1928 1974 1968 1946 1912 1918 1996 1922 1930
> 1964 1952 1994 1934 1980 1836 1984 1960 1956 1916 1908 1938
> 90491 rmgr: Heap2 len (rec/tot): 60/ 60, tx:
> 1999, lsn: 0/019FA058, prev 0/019F9EA8, desc: NEW_CID rel
> 1663/12678/2615; tid 11/59; cmin: 0, cmax: 4294967295, combo:
> 4294967295
> 90492 rmgr: Heap len (rec/tot): 127/ 127, tx:
> 1999, lsn: 0/019FA098, prev 0/019FA058, desc: INSERT off 59 flags
> 0x00, blkref #0: rel 1663/12678/2615 blk 11
>
> I thought that the logical decoding doesn't create ReorderBufferTXN of
> xid=1999 when processing NEW_CID since it decodes ASSIGNMENT of
> xid=1999 beforehand. But what actually happen is that it skips NEW_CID
> since the state of snapshot builder is SNAPBUILD_BUILDING_SNAPSHOT yet
> and then the state becomes SNAPBUILD_FULL_SNAPSHOT when processing
> RUNNING_XACTS , and therefore it creates two ReorderBufferTXN entries
> for xid = 1999 and xid = 1998 as top-level transactions when
> processing NEW_CID (ReorderBufferXidSetCatalogChanges creates xid=1999
> and ReorderBufferAddNewTupleCids creates xid = 1998).

I think the cause of this bug would be that a ReorderBufferTXN entry
of sub transaction is created as top-level transaction. And this
happens because we skip to decode ASSIGNMENT during the state of
snapshot builder < SNAPBUILD_FULL.

@@ -778,7 +778,7 @@ SnapBuildProcessNewCid(SnapBuild *builder,
TransactionId xid,
*/
ReorderBufferXidSetCatalogChanges(builder->reorder, xid, lsn);

- ReorderBufferAddNewTupleCids(builder->reorder, xlrec->top_xid, lsn,
+ ReorderBufferAddNewTupleCids(builder->reorder, xid, lsn,
xlrec->target_node, xlrec->target_tid,
xlrec->cmin, xlrec->cmax,
xlrec->combocid);

The above change in the proposed patch changes SnapBuildProcessNewCid
so that it passes sub transaction id instead of top transaction id to
ReorderBufferAddNewTupleCids that adds a (relfilenode, tid) -> (cmin,
cmax) mapping to the transaction. But I think the fix is not correct
since as the comment of ReorderBufferTXN describes, the mappings are
always assigned to the top-level transaction.

in reorderbuffer.h,
/*
* List of (relation, ctid) => (cmin, cmax) mappings for catalog tuples.
* Those are always assigned to the toplevel transaction. (Keep track of
* #entries to create a hash of the right size)
*/
dlist_head tuplecids;
uint64 ntuplecids;

Instead, I wonder if we can decode ASSIGNMENT even when the state of
snapshot builder < SNAPBUILD_FULL_SNAPSHOT. That way, the
ReorderBufferTXN entries of both top transaction and sub transaction
are created properly before we decode NEW_CID.

Attached patch do that. In my environment the issue seems to be fixed
but I'm still not confident that this is the right fix. Please review
it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
always_decode_assignment.patch application/octet-stream 1.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2019-07-09 10:06:39 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Previous Message Peter Eisentraut 2019-07-09 09:43:28 Re: errbacktrace