Re: Logical replication keepalive flood

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Abbas Butt <abbas(dot)butt(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Zahid Iqbal <zahid(dot)iqbal(at)enterprisedb(dot)com>
Subject: Re: Logical replication keepalive flood
Date: 2021-09-17 07:11:56
Message-ID: CAHut+Pvtk2vTk-UeL4Y2cdTHx6TDkUXx6PT2Rg5n0aYV4AfCWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 16, 2021 at 10:59 AM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> From Tuesday, September 14, 2021 1:39 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
> > However, the problem I found is that, with the patch applied, there is
> > a test failure when running “make check-world”:
> >
> > t/006_logical_decoding.pl ............ 4/14
> > # Failed test 'pg_recvlogical acknowledged changes'
> > # at t/006_logical_decoding.pl line 117.
> > # got: 'BEGIN
> > # table public.decoding_test: INSERT: x[integer]:5 y[text]:'5''
> > # expected: ''
> > # Looks like you failed 1 test of 14.
> > t/006_logical_decoding.pl ............ Dubious, test returned 1 (wstat
> > 256, 0x100) Failed 1/14 subtests
> >
> >
>
> After applying the patch,
> I saw the same problem and can reproduce it by the following steps:
>
> 1) execute the SQLs.
> -----------SQL-----------
> CREATE TABLE decoding_test(x integer, y text);
> SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding');
> INSERT INTO decoding_test(x,y) SELECT s, s::text FROM generate_series(1,4) s;
>
> -- use the lsn here to execute pg_recvlogical later
> SELECT lsn FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL) ORDER BY lsn DESC LIMIT 1;
> INSERT INTO decoding_test(x,y) SELECT s, s::text FROM generate_series(5,50) s;
> ----------------------
>
> 2) Then, if I execute the following command twice:
> # pg_recvlogical -E lsn -d postgres -S 'test_slot' --start --no-loop -f -
> BEGIN 708
> table public.decoding_test: INSERT: x[integer]:1 y[text]:'1'
> table public.decoding_test: INSERT: x[integer]:2 y[text]:'2'
> table public.decoding_test: INSERT: x[integer]:3 y[text]:'3'
> table public.decoding_test: INSERT: x[integer]:4 y[text]:'4'
> COMMIT 708
>
> # pg_recvlogical -E lsn -d postgres -S 'test_slot' --start --no-loop -f -
> BEGIN 709
>
> It still generated ' BEGIN 709' in the second time execution.
> But it will output nothing in the second time execution without the patch.
>

Hello Hous-san, thanks for including the steps. Unfortunately, no
matter what I tried, I could never get the patch to display the
problem "BEGIN 709" for the 2nd time execution of pg_recvlogical

After discussion offline (thanks Greg!) it was found that the
pg_recvlogical step 2 posted above is not quite identical to what the
TAP 006 test is doing.

Specifically, the TAP test also includes some other options (-o
include-xids=0 -o skip-empty-xacts=1) which are not in your step.

If I include these options then I can reproduce the problem.
-----------------------------------------
[postgres(at)CentOS7-x64 ~]$ pg_recvlogical -E '0/150B5E0' -d postgres
-S 'test_slot' --start --no-loop -o include-xids=0 -o
skip-empty-xacts=1 -f -
BEGIN
table public.decoding_test: INSERT: x[integer]:5 y[text]:'5'
-----------------------------------------

I don't know why these options should make any difference but they do.
Perhaps they cause a fluke of millisecond timing differences in our
different VM environments.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2021-09-17 07:18:30 Re: walsender timeout on logical replication set
Previous Message Bharath Rupireddy 2021-09-17 06:33:54 Re: Refactoring postgres_fdw code to rollback remote transaction