Re: Replication slot stats misgivings

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: Replication slot stats misgivings
Date: 2021-04-27 14:29:07
Message-ID: CAA4eK1K6-8pyt6prAP44tjnw4m4KP=1_EUMOH97nfP0TqUOJkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 27, 2021 at 5:40 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Apr 27, 2021 at 8:58 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> I have pushed this patch and seeing one buildfarm failure:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2021-04-27%2009%3A23%3A14
>
> starting permutation: s1_init s1_begin s1_insert_tbl1 s1_insert_tbl2
> s2_alter_tbl1_char s1_commit s2_get_changes
> + isolationtester: canceling step s1_init after 314 seconds
> step s1_init: SELECT 'init' FROM
> pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
> ?column?
>
> I am analyzing this.
>

After checking below logs corresponding to this test, it seems test
has been executed and create_slot was successful:
2021-04-27 11:06:43.770 UTC [17694956:52] isolation/concurrent_ddl_dml
STATEMENT: SELECT 'init' FROM
pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
2021-04-27 11:07:11.748 UTC [5243096:9] LOG: checkpoint starting: time
2021-04-27 11:09:24.332 UTC [5243096:10] LOG: checkpoint complete:
wrote 14 buffers (0.1%); 0 WAL file(s) added, 0 removed, 0 recycled;
write=0.716 s, sync=0.001 s, total=132.584 s; sync files=0,
longest=0.000 s, average=0.000 s; distance=198 kB, estimate=406 kB
2021-04-27 11:09:40.116 UTC [6226046:1] [unknown] LOG: connection
received: host=[local]
2021-04-27 11:09:40.117 UTC [17694956:53] isolation/concurrent_ddl_dml
LOG: statement: BEGIN;
2021-04-27 11:09:40.117 UTC [17694956:54] isolation/concurrent_ddl_dml
LOG: statement: INSERT INTO tbl1 (val1, val2) VALUES (1, 1);
2021-04-27 11:09:40.118 UTC [17694956:55] isolation/concurrent_ddl_dml
LOG: statement: INSERT INTO tbl2 (val1, val2) VALUES (1, 1);
2021-04-27 11:09:40.119 UTC [10944636:49] isolation/concurrent_ddl_dml
LOG: statement: ALTER TABLE tbl1 ALTER COLUMN val2 TYPE character
varying;

I am not sure but there is some possibility that even though create
slot is successful, the isolation tester got successful in canceling
it, maybe because create_slot is just finished at the same time. As we
can see from logs, during this test checkpoint also happened which
could also lead to the slowness of this particular command.

Also, I see a lot of messages like below which indicate stats
collector is also quite slow:
2021-04-27 10:57:59.385 UTC [18743536:1] LOG: using stale statistics
instead of current ones because stats collector is not responding

I am not sure if the timeout happened because the machine is slow or
is it in any way related to code. I am seeing some previous failures
due to timeout on this machine [1][2]. In those failures, I see the
"using stale stats...." message. Also, I am not able to see why it can
fail due to this patch?

[1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2021-02-23%2004%3A23%3A56
[2] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2020-12-24%2005%3A31%3A43

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-04-27 14:38:23 Re: Better sanity checking for message length words
Previous Message PG Doc comments form 2021-04-27 14:26:48 Clarify how triggers relate to transactions