Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: michael(dot)paquier(at)gmail(dot)com
Cc: andres(at)anarazel(dot)de, nag1010(at)gmail(dot)com, jdnelson(at)dyn(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?
Date: 2017-10-26 10:05:51
Message-ID: 20171026.190551.208996945.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Hello. Thank you for looking this.

At Mon, 16 Oct 2017 17:58:03 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqR+J1Xw+yzfsrehiQ+rh3ac+n5sEUgP7UOQ4_ymFnO9wg(at)mail(dot)gmail(dot)com>
> On Thu, Sep 7, 2017 at 12:33 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > At Wed, 6 Sep 2017 12:23:53 -0700, Andres Freund <andres(at)anarazel(dot)de> wrote in <20170906192353(dot)ufp2dq7wm5fd6qa7(at)alap3(dot)anarazel(dot)de>
> >> I'm not following. All we need to use is the beginning of the relevant
> >> records, that's easy enough to keep track of. We don't need to read the
> >> WAL or anything.
> >
> > The beginning is already tracked and nothing more to do.
>
> I have finally allocated time to look at your newly-proposed patch,
> sorry for the time it took. Patch 0002 forgot to include sys/stat.h to
> allow the debugging tool to compile :)
>
> > The first *problem* was WaitForWALToBecomeAvailable requests the
> > beginning of a record, which is not on the page the function has
> > been told to fetch. Still tliRecPtr is required to determine the
> > TLI to request, it should request RecPtr to be streamed.
>
> [...]
>
> > The rest to do is let XLogPageRead retry other sources
> > immediately. To do this I made ValidXLogPageHeader(at)xlogreader(dot)c
> > public (and renamed to XLogReaderValidatePageHeader).
> >
> > The patch attached fixes the problem and passes recovery
> > tests. However, the test for this problem is not added. It needs
> > to go to the last page in a segment then put a record continues
> > to the next segment, then kill the standby after receiving the
> > previous segment but before receiving the whole record.
>
> +typedef struct XLogPageHeaderData *XLogPageHeader;
> [...]
> +/* Validate a page */
> +extern bool XLogReaderValidatePageHeader(XLogReaderState *state,
> + XLogRecPtr recptr, XLogPageHeader hdr);
> Instead of exposing XLogPageHeaderData, I would recommend just using
> char* and remove this declaration. The comment on top of
> XLogReaderValidatePageHeader needs to make clear what caller should
> provide.

Seems reasonable. Added several lines in the comment for the
function.

> + if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr,
> + (XLogPageHeader) readBuf))
> + goto next_record_is_invalid;
> +
> [...]
> - ptr = tliRecPtr;
> + ptr = RecPtr;
> tli = tliOfPointInHistory(tliRecPtr, expectedTLEs);
>
> if (curFileTLI > 0 && tli < curFileTLI)
> The core of the patch is here (the patch has no comment so it is hard
> to understand what's the point of what is being done), and if I

Hmm, sorry. Added a brief comment there.

> understand that correctly, you allow the receiver to fetch the
> portions of a record spawned across multiple segments from different
> sources, and I am not sure that we'd want to break that promise.

We are allowing consecutive records at a segment boundary from
different sources are in the same series of xlog records. A
continuation records never spans over two TLIs but I might be
missing something here. (I found that an error message shows an
incorrect record pointer. The error message seems still be
useful.)

> Shouldn't we instead have the receiver side track the beginning of the
> record and send that position for the physical slot's restart_lsn?

The largest obstacle to do that is that walreceiver is not
utterly concerned to record internals. In other words, it doesn't
know what a record is. Teaching that introduces much complexity
and the complexity slows down the walreceiver.

Addition to that, this "problem" occurs also on replication
without a slot. The latest patch also help the case.

> This way the receiver would retain WAL segments from the real
> beginning of a record. restart_lsn for replication slots is set when
> processing the standby message in ProcessStandbyReplyMessage() using
> now the flush LSN, so a more correct value should be provided using
> that. Andres, what's your take on the matter?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
0001-Allow-switch-WAL-source-midst-of-record.patch text/x-patch 5.1 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message sideuxb-ky.consultant 2017-10-26 15:16:11 BUG #14874: Dublicate values in primary key
Previous Message postgresql 2017-10-26 09:44:28 BUG #14873: table_constraint description missing in ALTER TABLE synopsis

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2017-10-26 10:12:18 Re: Removing [Merge]Append nodes which contain a single subpath
Previous Message Petr Jelinek 2017-10-26 09:20:10 Subscriber resets additional columns to NULL on UPDATE