Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, jdnelson(at)dyn(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?
Date: 2017-02-02 06:34:33
Message-ID: CAB7nPqQ05G15JooRMEONgPkW0osot77yaFAUF9_6Q8G+v+2+xg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Thu, Feb 2, 2017 at 1:26 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> I'm afraid that many WAL segments would start with a continuation record
> when there are the workload of short transactions (e.g., by pgbench), and
> which would make restart_lsn go behind very much. No?

I don't quite understand this argument. Even if there are many small
transactions, that would cause restart_lsn to just be late by one
segment, all the time.

> The discussion on this thread just makes me think that restart_lsn should
> indicate the replay location instead of flush location. This seems safer.

That would penalize WAL retention on the primary with standbys using
recovery_min_apply_delay and a slot for example...

We can attempt to address this problem two ways. The patch proposed
(ugly btw and there are two typos!) is doing it in the WAL sender by
not making restart_lsn jump to the next segment if a continuation
record is found. Or we could have the standby request for the next
segment instead if the record it wants to replay from is at a boundary
and that it locally has the beginning of the record, and it has it
because it already confirmed to the primary that it flushed to the
next segment. Not sure which fix is better though.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message martin.langwisch 2017-02-02 09:05:19 BUG #14525: select <function>.* takes extremely long time
Previous Message crvv.mail 2017-02-02 06:00:20 BUG #14524: Commands compare with nested subquery expressions fail with "should not reference subplan var"

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2017-02-02 08:05:39 Re: Deadlock in XLogInsert at AIX
Previous Message Nikhil Sontakke 2017-02-02 06:07:20 Re: Speedup twophase transactions