Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: michael(dot)paquier(at)gmail(dot)com
Cc: masao(dot)fujii(at)gmail(dot)com, jdnelson(at)dyn(dot)com, pgsql-hackers(at)postgresql(dot)org, pgsql-bugs(at)postgresql(dot)org
Subject: Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?
Date: 2017-02-03 03:16:49
Message-ID: 20170203.121649.184184028.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

At Thu, 2 Feb 2017 15:34:33 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqQ05G15JooRMEONgPkW0osot77yaFAUF9_6Q8G+v+2+xg(at)mail(dot)gmail(dot)com>
> On Thu, Feb 2, 2017 at 1:26 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> > I'm afraid that many WAL segments would start with a continuation record
> > when there are the workload of short transactions (e.g., by pgbench), and
> > which would make restart_lsn go behind very much. No?
>
> I don't quite understand this argument. Even if there are many small
> transactions, that would cause restart_lsn to just be late by one
> segment, all the time.
>
> > The discussion on this thread just makes me think that restart_lsn should
> > indicate the replay location instead of flush location. This seems safer.
>
> That would penalize WAL retention on the primary with standbys using
> recovery_min_apply_delay and a slot for example...
>
> We can attempt to address this problem two ways. The patch proposed
> (ugly btw and there are two typos!) is doing it in the WAL sender by
> not making restart_lsn jump to the next segment if a continuation
> record is found.

Sorry for the ug..:p Anyway, the previous version was not the
latest. The attached one is the revised version. (Sorry, I
haven't find a typo by myself..)

> Or we could have the standby request for the next
> segment instead if the record it wants to replay from is at a boundary
> and that it locally has the beginning of the record, and it has it
> because it already confirmed to the primary that it flushed to the
> next segment. Not sure which fix is better though.

We could it as I said, with some refactoring ReadRecord involving
reader plugin mechanism..

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
0001-Fix-a-bug-of-physical-replication-slot.patch text/x-patch 7.3 KB

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Ranjeet Singh03 2017-02-03 11:13:54 postgres client connection issue
Previous Message Tom Lane 2017-02-02 22:48:05 Re: BUG #14524: Commands compare with nested subquery expressions fail with "should not reference subplan var"

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-02-03 03:32:43 pgsql: pageinspect: Try to fix some bugs in previous commit.
Previous Message Kyotaro HORIGUCHI 2017-02-03 02:47:16 Re: Logical Replication and Character encoding