Re: Replication slot drop message is sent after pgstats shutdown.

From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Replication slot drop message is sent after pgstats shutdown.
Date: 2022-03-18 07:28:37
Message-ID: 20220318072837.GC2739027@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 15, 2022 at 08:58:56AM -0800, Andres Freund wrote:
> Pushed the test yesterday evening, after Tom checked if it is likely to be
> problematic. Seems to worked without problems so far.

wrasse │ 2022-02-15 09:29:06 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-02-15%2009%3A29%3A06
flaviventris │ 2022-02-24 15:17:30 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2022-02-24%2015%3A17%3A30
calliphoridae │ 2022-03-08 01:14:51 │ HEAD │ http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2022-03-08%2001%3A14%3A51

The buildfarm failed to convey adequate logs for this particular test suite.
Here's regression.diffs from the wrasse case (saved via keep_error_builds):

===
diff -U3 /export/home/nm/farm/studio64v12_6/HEAD/pgsql/contrib/test_decoding/expected/slot_creation_error.out /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/contrib/test_decoding/output_iso/results/slot_creation_error.out
--- /export/home/nm/farm/studio64v12_6/HEAD/pgsql/contrib/test_decoding/expected/slot_creation_error.out Tue Feb 15 06:58:14 2022
+++ /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/contrib/test_decoding/output_iso/results/slot_creation_error.out Tue Feb 15 11:38:14 2022
@@ -29,16 +29,17 @@
t
(1 row)

-step s2_init: <... completed>
-ERROR: canceling statement due to user request
step s1_view_slot:
SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE slot_name = 'slot_creation_error'

-slot_name|slot_type|active
----------+---------+------
-(0 rows)
+slot_name |slot_type|active
+-------------------+---------+------
+slot_creation_error|logical |t
+(1 row)

step s1_c: COMMIT;
+step s2_init: <... completed>
+ERROR: canceling statement due to user request

starting permutation: s1_b s1_xid s2_init s1_c s1_view_slot s1_drop_slot
step s1_b: BEGIN;
===

I can make it fail that way by injecting a 1s delay here:

--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3339,6 +3339,7 @@ ProcessInterrupts(void)
*/
if (!DoingCommandRead)
{
+ pg_usleep(1 * 1000 * 1000);
LockErrorCleanup();
ereport(ERROR,
(errcode(ERRCODE_QUERY_CANCELED),

I plan to fix this as attached, similar to how commit c04c767 fixed the same
challenge in detach-partition-concurrently-[34].

Attachment Content-Type Size
slot_creation_error-cancel-race-v1.patch text/plain 2.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dongming Liu 2022-03-18 07:30:49 Re: DSA failed to allocate memory
Previous Message a.sokolov 2022-03-18 07:24:46 Re: On login trigger: take three