Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1

From: Andres Freund <andres(at)anarazel(dot)de>
To: Lakshmi Narayanan Sreethar <lakshmi(at)timescale(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: DROP DATABASE deadlocks with logical replication worker in PG 15.1
Date: 2023-01-14 16:02:01
Message-ID: 20230114160201.en5g5bna66x6lnuw@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

Good catch.

The problem is here:

On 2023-01-13 20:53:49 +0530, Lakshmi Narayanan Sreethar wrote:
> #7 0x0000559cccbe1e71 in LogicalRepSyncTableStart
> (origin_startpos=0x7fffb26f7728) at
> /pg15.1/src/backend/replication/logical/tablesync.c:1353

Because the logical rep code explicitly prevents interrupts:

/*
* Create a new permanent logical decoding slot. This slot will be used
* for the catchup phase after COPY is done, so tell it to use the
* snapshot to make the final data consistent.
*
* Prevent cancel/die interrupts while creating slot here because it is
* possible that before the server finishes this command, a concurrent
* drop subscription happens which would complete without removing this
* slot leading to a dangling slot on the server.
*/
HOLD_INTERRUPTS();
walrcv_create_slot(LogRepWorkerWalRcvConn,
slotname, false /* permanent */ , false /* two_phase */ ,
CRS_USE_SNAPSHOT, origin_startpos);
RESUME_INTERRUPTS();

Which is just completely entirely wrong. Independent of this issue even. Not
allowing termination for the duration of command executed over network?

This is from:

commit 6b67d72b604cb913e39324b81b61ab194d94cba0
Author: Amit Kapila <akapila(at)postgresql(dot)org>
Date: 2021-03-17 08:15:12 +0530

Fix race condition in drop subscription's handling of tablesync slots.

Commit ce0fdbfe97 made tablesync slots permanent and allow Drop
Subscription to drop such slots. However, it is possible that before
tablesync worker could get the acknowledgment of slot creation, drop
subscription stops it and that can lead to a dangling slot on the
publisher. Prevent cancel/die interrupts while creating a slot in the
tablesync worker.

Reported-by: Thomas Munro as per buildfarm
Author: Amit Kapila
Reviewed-by: Vignesh C, Takamichi Osumi
Discussion: https://postgr.es/m/CA+hUKGJG9dWpw1cOQ2nzWU8PHjm=PTraB+KgE5648K9nTfwvxg@mail.gmail.com

But this can't be the right fix.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2023-01-14 16:05:45 Re: BUG #17751: DDL CREATE statement accepts invalid default values for FOREIGN KEY constraint.
Previous Message PG Bug reporting form 2023-01-14 08:32:15 BUG #17751: DDL CREATE statement accepts invalid default values for FOREIGN KEY constraint.