Re: [HACKERS] Replication status in logical replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: David Steele <david(at)pgmasters(dot)net>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Vaishnavi Prabakaran <vaishnaviprabakaran(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Replication status in logical replication
Date: 2018-07-09 08:25:55
Message-ID: CAD21AoC+3jMiSUwQUKJ45vKuSzBy++k_Dd2OEinQ-f5-UFT=pw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 6, 2018 at 2:21 PM, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Thu, Jul 05, 2018 at 05:13:27PM +0900, Michael Paquier wrote:
>> This concerns as well v10, so that's not actually an open item...
>> Well, it was an open item last year. The last set of patches is from
>> Simon here:
>> https://www.postgresql.org/message-id/CANP8%2BjLwgsexwdPkBtkN5kdHN5TwV-d%3Di311Tq_FdOmzJ8QyRQ%40mail.gmail.com
>
> logical_repl_caught_up_v4alt2.patch is actually incorrect after I tested
> the thing, and that logical_repl_caught_up_v4.patch gets the correct
> call.
>
>> Simon, do you feel confident with your patch? If yes, could you finish
>> wrapping it? I am getting myself familiar with the problem as this has
>> been around for some time now so I am reviewing the thing as well and
>> then I can board the ship..
>
> Okay, I have spent some time today looking at this patch, and the error
> is very easy to reproduce once you do that in the TAP tests:
> 1) Stop and start once the publisher in one of the tests of
> src/test/subscription.
> 2) Enforce wait_for_catchup() to check for state = 'streaming'.
> And then you would see the tests waiting until timeout is reached and
> then die.
>
> I would be inclined to add those tests in the final patch, the
> disadvantage being that 1) makes one of the test scripts a bit longer,
> but it can reproduce the failures most of the time. Having 2) is
> actually nice for physical replication as the tests in
> src/test/recovery/ use wait_for_catchup() in various ways.
>
> Some other notes about the patch:
> - I switched the error message in WalSndLoop as mentioned upthread for
> nodes catching up, aka no more "primary" but "upstream server".
> - Added a note about using only GetFlushRecPtr in XLogSendLogical as
> logical decoding cannot be used on standby nodes. If one day logical
> decoding gets supports on standby then this would need an update as
> well.
>
> Does this look fine to all the folks involved in this thread? It is
> Friday afternoon here so my brain is getting fried, but I can finish
> wrapping up this patch at the beginning of next week if there are no
> objections. At quick glance this indeed would need a backpatch down to
> 9.4 but I have not spent time testing those configurations yet.

Thank you for updating the patch. The patch looks fine to me, and I
agree with all changes you made.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tsunakawa, Takayuki 2018-07-09 08:29:08 RE: How can we submit code patches that implement our (pending) patents?
Previous Message Craig Ringer 2018-07-09 08:09:53 Re: How can we submit code patches that implement our (pending) patents?