Re: [HACKERS] Replication status in logical replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Vaishnavi Prabakaran <vaishnaviprabakaran(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Replication status in logical replication
Date: 2017-11-21 21:06:29
Message-ID: CAD21AoAMZSdqQzBBp2fu6a0HjWcZS_TFb_8+74SH7VUXtU6Mww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 14, 2017 at 6:46 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Tue, Sep 26, 2017 at 3:45 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Tue, Sep 26, 2017 at 10:36 AM, Vaishnavi Prabakaran
>> <vaishnaviprabakaran(at)gmail(dot)com> wrote:
>>> On Wed, Sep 13, 2017 at 9:59 AM, Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
>>>> I’m not entirely sure why this was flagged as "Waiting for Author” by the
>>>> automatic run, the patch applies for me and builds so resetting back to
>>>> “Needs
>>>> review”.
>>>>
>>>
>>> This patch applies and build cleanly and I did a testing with one publisher
>>> and one subscriber, and confirm that the replication state after restarting
>>> the server now is "streaming" and not "Catchup".
>>>
>>> And, I don't find any issues with code and patch to me is ready for
>>> committer, marked the same in cf entry.
>
> Hi Sawada-san,
>
> My patch-testing robot doesn't like this patch[1]. I just tried it on
> my laptop to double-check and get some more details, and saw the same
> failures:
>
> (1) "make check" under src/test/recovery fails like this:
>
> t/006_logical_decoding.pl ............ 2/16 # Looks like your test
> exited with 29 just after 4.
> t/006_logical_decoding.pl ............ Dubious, test returned 29
> (wstat 7424, 0x1d00)
> Failed 12/16 subtests
>
> regress_log_006_logical_decoding says:
>
> ok 4 - got same expected output from pg_recvlogical decoding session
> pg_recvlogical timed out at
> /opt/local/lib/perl5/vendor_perl/5.24/IPC/Run.pm line 2918.
> waiting for endpos 0/1609B60 with stdout '', stderr '' at
> /Users/munro/projects/postgres/src/test/recovery/../../../src/test/perl/PostgresNode.pm
> line 1700.
> ### Stopping node "master" using mode immediate
> # Running: pg_ctl -D
> /Users/munro/projects/postgres/src/test/recovery/tmp_check/t_006_logical_decoding_master_data/pgdata
> -m immediate stop
> waiting for server to shut down.... done
> server stopped
> # No postmaster PID for node "master"
> # Looks like your test exited with 29 just after 4.
>
> (2) "make check" under src/test/subscription says:
>
> t/001_rep_changes.pl .. ok
> t/002_types.pl ........ #
> # Looks like your test exited with 60 before it could output anything.
> t/002_types.pl ........ Dubious, test returned 60 (wstat 15360, 0x3c00)
> Failed 3/3 subtests
> t/003_constraints.pl ..
>
> Each of those tooks several minutes, and I stopped it there. It may
> be going to say some more things but is taking a very long time
> (presumably timing out, but the 001 took ages and then succeeded...
> hmm). In fact I had to run this on my laptop to see that because on
> Travis CI the whole test job just gets killed after 10 minutes of
> non-output and the above output was never logged because of the way
> concurrent test jobs' output is buffered.
>
> I didn't try to figure out what is going wrong.
>

Thank you for the notification!

After investigation, I found out that my previous patch was wrong
direction. I should have changed XLogSendLogical() so that we can
check the read LSN and set WalSndCaughtUp = true even after read a
record without wait. Attached updated patch passed 'make check-world'.
Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
logical_repl_caught_up_v2.patch application/octet-stream 572 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2017-11-21 21:07:45 Re: [HACKERS] CLUSTER command progress monitor
Previous Message Merlin Moncure 2017-11-21 21:05:23 Re: feature request: consume asynchronous notification via a function