Re: Refactor recovery conflict signaling a little

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Refactor recovery conflict signaling a little
Date: 2026-01-23 02:41:40
Message-ID: EA619211-93E3-40BF-8EC0-AED29B87AC33@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Jan 23, 2026, at 07:20, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> I had a look at recovery conflict signaling and a few things caught my eye. No functional changes, but some cleanups and readability improvements:
>
> Patch 0001: Remove useless errdetail_abort()
> --------------------------------------------
>
> The function is supposed to add DETAIL to errors when you are in an aborted transaction, if the transaction was aborted by a recovery conflict, like this:
>
> ERROR: current transaction is aborted, commands ignored until end of transaction block"
> DETAIL: Abort reason: recovery conflict
>
> But I don't see how to reach that. If a transaction is aborted by recovery conflict, you get a different error like this:
>
> ERROR: canceling statement due to conflict with recovery
> DETAIL: User was holding a relation lock for too long.
>
> The transaction abort clears the 'recoveryConflictPending' flag, so even if that happens in a transaction block, you don't get that "DETAIL: Abort reason: recovery conflict" in the subsequent errors.
>
> errdetail_abort() was introduced in commit a8ce974cdd. I suppose it was needed back then, but the signal handling has changed a lot since. Looking at that commit now, though, I don't really understand how it was reachable even back then. (Except with a race with an unrelated transaction abort, see commit message)
>
> Has anyone seen the "DETAIL: Abort reason: recovery conflict" in recent years, or ever? If not, let's rip it out.
>

I did a Google search and couldn't find any post reporting the message, which seems to prove that message can be removed.

>
> 0002: Don't hint that you can reconnect when the database is dropped
> --------------------------------------------------------------------
>
> If you're connected to a database is being dropped, during recovery, you get an error like this:
>
> FATAL: terminating connection due to conflict with recovery
> DETAIL: User was connected to a database that must be dropped.
> HINT: In a moment you should be able to reconnect to the database and repeat your command.
>
> The hint seems misleading. The database is being dropped, you most likely can *not* reconnect to it. Let's remove it.
>

I like this change. Not only removing the misleading error message, the code is also clearer now.

>
> 0003-0004: Separate RecoveryConflictReasons from procsignals
> -------------------------------------------------------------
>
> We're currently using different PROCSIG_* flags to indicate different kinds of recovery conflicts. We're also abusing the same flags in functions like LogRecoveryConflict, which isn't related to inter-process signaling. It seems better to have a separate enum for the recovery conflict reasons. With this patch, there's just a single PROCSIG_RECOVERY_CONFLICT to wake up a process on a recovery conflict, and the reason is communicated by setting a flag in a bitmask in PGPROC.
>
> I was inspired to do this in preparation of my project to replaces latches with "interrupts". By having just a single PROCSIG flag, we reduce the need for "interrupt bits" with that project. But it seems nicer on its own merits too.
>
>
> 0005: Refactor ProcessRecoveryConflictInterrupt for readability
> ---------------------------------------------------------------
>
> The function had a switch-statement with fallthrough through all the cases. It took me a while to understand how it works. Once I finally understood it, I refactored it to not rely on the fallthrough. I hope this makes it easier for others too.
>
> - Heikki
> <0001-Remove-useless-errdetail_abort.patch><0002-Don-t-hint-that-you-can-reconnect-when-the-database-.patch><0003-Use-ProcNumber-rather-than-pid-in-ReplicationSlot.patch><0004-Separate-RecoveryConflictReasons-from-procsignals.patch><0005-Refactor-ProcessRecoveryConflictInterrupt-for-readab.patch>

A few comments on 003-0005:

1 - 0003
```
ReplicationSlotAcquire(const char *name, bool nowait, bool error_if_invalid)
{
ReplicationSlot *s;
- int active_pid;
+ ProcNumber active_proc;
+ pid_t active_pid;
```

Active_pid is only used inside the "if (active_proc != MyProcNumber)” clause, so it can be only defined within the “if” clause.

2 - 0003
```
if (MyBackendType == B_STARTUP)
- (void) SendProcSignal(active_pid,
- PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT,
- INVALID_PROC_NUMBER);
+ SendProcSignal(active_pid,
+ PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT,
+ active_proc);
```

Here active_proc!=INVALID_PROC_NUMBER, so this changes the original logic without an explanation. Is the change intentional?

3 - 0004
```
+ * This is a bitmask of RecoveryConflictReasons
+ */
+ pg_atomic_uint32 pendingRecoveryConflicts;
```

I just feel this comment is a little bit confusing because RecoveryConflictReasons is an enum. Maybe we can say something like: This is a bitmask; each bit corresponds to one RecoveryConflictReason enum value.

4 - 0004
```
- (void) SendProcSignal(pid, sigmode, vxid.procNumber);
+ (void) SendProcSignal(pid, PROCSIG_RECOVERY_CONFLICT, vxid.procNumber);
```

Nit: Here (void) is retained for SendProcSignal, but in the place of commit 2 for 0003, (void) is deleted when calling SendProcSignal, is there any reason for retaining this one and deleting that one?

5 - 0004
```
+ * doesn't check for deadlock direcly, because we want to kill one of the
```

Typo: direcly -> directly

6 - 0005
```
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a2fa98ee971..bbf2254ca67 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -179,11 +179,15 @@ static bool IsTransactionExitStmt(Node *parsetree);
static bool IsTransactionExitStmtList(List *pstmts);
static bool IsTransactionStmtList(List *pstmts);
static void drop_unnamed_stmt(void);
+static void ProcessRecoveryConflictInterrupts(void);
+static void ProcessRecoveryConflictInterrupt(RecoveryConflictReason reason);
+static void report_recovery_conflict(RecoveryConflictReason reason);
static void log_disconnections(int code, Datum arg);
static void enable_statement_timeout(void);
static void disable_statement_timeout(void);


+
/* ----------------------------------------------------------------
```

Nit: No need to add this empty line.

7 - 0005
```
+static void
+report_recovery_conflict(RecoveryConflictReason reason)
+{
+ bool fatal;

+ if (RECOVERY_CONFLICT_DATABASE)
+ {
```

I believe this should be if (reason == RECOVERY_CONFLICT_DATABASE).

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2026-01-23 02:48:44 Re: DOCS - "\d mytable" also shows any publications that publish mytable
Previous Message Fujii Masao 2026-01-23 02:40:58 Re: DOCS - "\d mytable" also shows any publications that publish mytable