Re: Replication slot drop message is sent after pgstats shutdown.

From: Andres Freund <andres(at)anarazel(dot)de>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Replication slot drop message is sent after pgstats shutdown.
Date: 2022-03-18 20:24:15
Message-ID: 20220318202415.6mxm6yjvlmjy7tlo@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-03-18 00:28:37 -0700, Noah Misch wrote:
> On Tue, Feb 15, 2022 at 08:58:56AM -0800, Andres Freund wrote:
> > Pushed the test yesterday evening, after Tom checked if it is likely to be
> > problematic. Seems to worked without problems so far.
>
> wrasse │ 2022-02-15 09:29:06 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-02-15%2009%3A29%3A06
> flaviventris │ 2022-02-24 15:17:30 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2022-02-24%2015%3A17%3A30
> calliphoridae │ 2022-03-08 01:14:51 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2022-03-08%2001%3A14%3A51

Huh. Weirdly enough I saw this failure twice in a development branch
yesterday...

> The buildfarm failed to convey adequate logs for this particular test suite.
> Here's regression.diffs from the wrasse case (saved via keep_error_builds):

Thanks for getting that!

> ===
> diff -U3 /export/home/nm/farm/studio64v12_6/HEAD/pgsql/contrib/test_decoding/expected/slot_creation_error.out /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/contrib/test_decoding/output_iso/results/slot_creation_error.out
> --- /export/home/nm/farm/studio64v12_6/HEAD/pgsql/contrib/test_decoding/expected/slot_creation_error.out Tue Feb 15 06:58:14 2022
> +++ /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/contrib/test_decoding/output_iso/results/slot_creation_error.out Tue Feb 15 11:38:14 2022
> @@ -29,16 +29,17 @@
> t
> (1 row)
>
> -step s2_init: <... completed>
> -ERROR: canceling statement due to user request
> step s1_view_slot:
> SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE slot_name = 'slot_creation_error'
>
> -slot_name|slot_type|active
> ----------+---------+------
> -(0 rows)
> +slot_name |slot_type|active
> +-------------------+---------+------
> +slot_creation_error|logical |t
> +(1 row)
>
> step s1_c: COMMIT;
> +step s2_init: <... completed>
> +ERROR: canceling statement due to user request
>
> starting permutation: s1_b s1_xid s2_init s1_c s1_view_slot s1_drop_slot
> step s1_b: BEGIN;
> ===
>
> I can make it fail that way by injecting a 1s delay here:
>
> --- a/src/backend/tcop/postgres.c
> +++ b/src/backend/tcop/postgres.c
> @@ -3339,6 +3339,7 @@ ProcessInterrupts(void)
> */
> if (!DoingCommandRead)
> {
> + pg_usleep(1 * 1000 * 1000);
> LockErrorCleanup();
> ereport(ERROR,
> (errcode(ERRCODE_QUERY_CANCELED),

So isolationtester still sees the blocking condition from before the cancel
processing is finished and thus proceeds to the next statement - which it
normally should only do once the other running step is detected as still
blocking?

I wonder if we should emit <waiting> everytime a step is detected anew as
being blocked to make it easier to understand issues like this.

> diff --git a/contrib/test_decoding/specs/slot_creation_error.spec b/contrib/test_decoding/specs/slot_creation_error.spec
> index 6816696..d1e35bf 100644
> --- a/contrib/test_decoding/specs/slot_creation_error.spec
> +++ b/contrib/test_decoding/specs/slot_creation_error.spec
> @@ -35,7 +35,7 @@ step s2_init {
> # The tests first start a transaction with an xid assigned in s1, then create
> # a slot in s2. The slot creation waits for s1's transaction to end. Instead
> # we cancel / terminate s2.
> -permutation s1_b s1_xid s2_init s1_view_slot s1_cancel_s2 s1_view_slot s1_c
> +permutation s1_b s1_xid s2_init s1_view_slot s1_cancel_s2(s2_init) s1_view_slot s1_c
> permutation s1_b s1_xid s2_init s1_c s1_view_slot s1_drop_slot # check slot creation still works
> -permutation s1_b s1_xid s2_init s1_terminate_s2 s1_c s1_view_slot
> +permutation s1_b s1_xid s2_init s1_terminate_s2(s2_init) s1_c s1_view_slot
> # can't run tests after this, due to s2's connection failure

That looks good to me.

Thanks!

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Borisov 2022-03-18 20:41:00 Re: Probable CF bot degradation
Previous Message Andres Freund 2022-03-18 20:05:32 Re: Proposal: Support custom authentication methods using hooks