| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
| Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Refactor recovery conflict signaling a little |
| Date: | 2026-01-23 02:41:40 |
| Message-ID: | EA619211-93E3-40BF-8EC0-AED29B87AC33@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Jan 23, 2026, at 07:20, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> I had a look at recovery conflict signaling and a few things caught my eye. No functional changes, but some cleanups and readability improvements:
>
> Patch 0001: Remove useless errdetail_abort()
> --------------------------------------------
>
> The function is supposed to add DETAIL to errors when you are in an aborted transaction, if the transaction was aborted by a recovery conflict, like this:
>
> ERROR: current transaction is aborted, commands ignored until end of transaction block"
> DETAIL: Abort reason: recovery conflict
>
> But I don't see how to reach that. If a transaction is aborted by recovery conflict, you get a different error like this:
>
> ERROR: canceling statement due to conflict with recovery
> DETAIL: User was holding a relation lock for too long.
>
> The transaction abort clears the 'recoveryConflictPending' flag, so even if that happens in a transaction block, you don't get that "DETAIL: Abort reason: recovery conflict" in the subsequent errors.
>
> errdetail_abort() was introduced in commit a8ce974cdd. I suppose it was needed back then, but the signal handling has changed a lot since. Looking at that commit now, though, I don't really understand how it was reachable even back then. (Except with a race with an unrelated transaction abort, see commit message)
>
> Has anyone seen the "DETAIL: Abort reason: recovery conflict" in recent years, or ever? If not, let's rip it out.
>
I did a Google search and couldn't find any post reporting the message, which seems to prove that message can be removed.
>
> 0002: Don't hint that you can reconnect when the database is dropped
> --------------------------------------------------------------------
>
> If you're connected to a database is being dropped, during recovery, you get an error like this:
>
> FATAL: terminating connection due to conflict with recovery
> DETAIL: User was connected to a database that must be dropped.
> HINT: In a moment you should be able to reconnect to the database and repeat your command.
>
> The hint seems misleading. The database is being dropped, you most likely can *not* reconnect to it. Let's remove it.
>
I like this change. Not only removing the misleading error message, the code is also clearer now.
>
> 0003-0004: Separate RecoveryConflictReasons from procsignals
> -------------------------------------------------------------
>
> We're currently using different PROCSIG_* flags to indicate different kinds of recovery conflicts. We're also abusing the same flags in functions like LogRecoveryConflict, which isn't related to inter-process signaling. It seems better to have a separate enum for the recovery conflict reasons. With this patch, there's just a single PROCSIG_RECOVERY_CONFLICT to wake up a process on a recovery conflict, and the reason is communicated by setting a flag in a bitmask in PGPROC.
>
> I was inspired to do this in preparation of my project to replaces latches with "interrupts". By having just a single PROCSIG flag, we reduce the need for "interrupt bits" with that project. But it seems nicer on its own merits too.
>
>
> 0005: Refactor ProcessRecoveryConflictInterrupt for readability
> ---------------------------------------------------------------
>
> The function had a switch-statement with fallthrough through all the cases. It took me a while to understand how it works. Once I finally understood it, I refactored it to not rely on the fallthrough. I hope this makes it easier for others too.
>
> - Heikki
> <0001-Remove-useless-errdetail_abort.patch><0002-Don-t-hint-that-you-can-reconnect-when-the-database-.patch><0003-Use-ProcNumber-rather-than-pid-in-ReplicationSlot.patch><0004-Separate-RecoveryConflictReasons-from-procsignals.patch><0005-Refactor-ProcessRecoveryConflictInterrupt-for-readab.patch>
A few comments on 003-0005:
1 - 0003
```
ReplicationSlotAcquire(const char *name, bool nowait, bool error_if_invalid)
{
ReplicationSlot *s;
- int active_pid;
+ ProcNumber active_proc;
+ pid_t active_pid;
```
Active_pid is only used inside the "if (active_proc != MyProcNumber)” clause, so it can be only defined within the “if” clause.
2 - 0003
```
if (MyBackendType == B_STARTUP)
- (void) SendProcSignal(active_pid,
- PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT,
- INVALID_PROC_NUMBER);
+ SendProcSignal(active_pid,
+ PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT,
+ active_proc);
```
Here active_proc!=INVALID_PROC_NUMBER, so this changes the original logic without an explanation. Is the change intentional?
3 - 0004
```
+ * This is a bitmask of RecoveryConflictReasons
+ */
+ pg_atomic_uint32 pendingRecoveryConflicts;
```
I just feel this comment is a little bit confusing because RecoveryConflictReasons is an enum. Maybe we can say something like: This is a bitmask; each bit corresponds to one RecoveryConflictReason enum value.
4 - 0004
```
- (void) SendProcSignal(pid, sigmode, vxid.procNumber);
+ (void) SendProcSignal(pid, PROCSIG_RECOVERY_CONFLICT, vxid.procNumber);
```
Nit: Here (void) is retained for SendProcSignal, but in the place of commit 2 for 0003, (void) is deleted when calling SendProcSignal, is there any reason for retaining this one and deleting that one?
5 - 0004
```
+ * doesn't check for deadlock direcly, because we want to kill one of the
```
Typo: direcly -> directly
6 - 0005
```
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a2fa98ee971..bbf2254ca67 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -179,11 +179,15 @@ static bool IsTransactionExitStmt(Node *parsetree);
static bool IsTransactionExitStmtList(List *pstmts);
static bool IsTransactionStmtList(List *pstmts);
static void drop_unnamed_stmt(void);
+static void ProcessRecoveryConflictInterrupts(void);
+static void ProcessRecoveryConflictInterrupt(RecoveryConflictReason reason);
+static void report_recovery_conflict(RecoveryConflictReason reason);
static void log_disconnections(int code, Datum arg);
static void enable_statement_timeout(void);
static void disable_statement_timeout(void);
+
/* ----------------------------------------------------------------
```
Nit: No need to add this empty line.
7 - 0005
```
+static void
+report_recovery_conflict(RecoveryConflictReason reason)
+{
+ bool fatal;
+ if (RECOVERY_CONFLICT_DATABASE)
+ {
```
I believe this should be if (reason == RECOVERY_CONFLICT_DATABASE).
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2026-01-23 02:48:44 | Re: DOCS - "\d mytable" also shows any publications that publish mytable |
| Previous Message | Fujii Masao | 2026-01-23 02:40:58 | Re: DOCS - "\d mytable" also shows any publications that publish mytable |