| From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
|---|---|
| To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Refactor recovery conflict signaling a little |
| Date: | 2026-02-03 13:31:36 |
| Message-ID: | 81b2c9a1-6d75-4ba9-894c-fd66c9036d99@iki.fi |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Thanks for the review!
On 23/01/2026 04:41, Chao Li wrote:
>> On Jan 23, 2026, at 07:20, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> Patch 0001: Remove useless errdetail_abort()
>> --------------------------------------------
>>
>> ...
>>
>> Has anyone seen the "DETAIL: Abort reason: recovery conflict" in recent years, or ever? If not, let's rip it out.
>
> I did a Google search and couldn't find any post reporting the message, which seems to prove that message can be removed.
>
>> 0002: Don't hint that you can reconnect when the database is dropped
>> --------------------------------------------------------------------
>>
>> If you're connected to a database is being dropped, during recovery, you get an error like this:
>>
>> FATAL: terminating connection due to conflict with recovery
>> DETAIL: User was connected to a database that must be dropped.
>> HINT: In a moment you should be able to reconnect to the database and repeat your command.
>>
>> The hint seems misleading. The database is being dropped, you most likely can *not* reconnect to it. Let's remove it.
>
> I like this change. Not only removing the misleading error message, the code is also clearer now.
Pushed patches 0001 and 0002.
> 1 - 0003
> ```
> ReplicationSlotAcquire(const char *name, bool nowait, bool error_if_invalid)
> {
> ReplicationSlot *s;
> - int active_pid;
> + ProcNumber active_proc;
> + pid_t active_pid;
> ```
>
> Active_pid is only used inside the "if (active_proc != MyProcNumber)” clause, so it can be only defined within the “if” clause.
It needs to fetched while still holding the lock. We could arrange the
code to avoid fetching it when active_proc == MyProcNumber, but it seems
it would be less clear, and this isn't performance critical.
> 2 - 0003
> ```
> if (MyBackendType == B_STARTUP)
> - (void) SendProcSignal(active_pid,
> - PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT,
> - INVALID_PROC_NUMBER);
> + SendProcSignal(active_pid,
> + PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT,
> + active_proc);
> ```
>
> Here active_proc!=INVALID_PROC_NUMBER, so this changes the original logic without an explanation. Is the change intentional?
Yes, it's intentional. SendProcSignal() is more efficient when you pass
procNumber argument. We didn't pass it here before because we didn't
have it, but now we do.
> 3 - 0004
> ```
> + * This is a bitmask of RecoveryConflictReasons
> + */
> + pg_atomic_uint32 pendingRecoveryConflicts;
> ```
>
> I just feel this comment is a little bit confusing because RecoveryConflictReasons is an enum. Maybe we can say something like: This is a bitmask; each bit corresponds to one RecoveryConflictReason enum value.
Ok, adopted that wording.
> 4 - 0004
> ```
> - (void) SendProcSignal(pid, sigmode, vxid.procNumber);
> + (void) SendProcSignal(pid, PROCSIG_RECOVERY_CONFLICT, vxid.procNumber);
> ```
>
> Nit: Here (void) is retained for SendProcSignal, but in the place of commit 2 for 0003, (void) is deleted when calling SendProcSignal, is there any reason for retaining this one and deleting that one?
I added the (void) back to the SendProcSignal call in 0003. We're not
very consistent about that, there are SendProcSignal calls with and
without the (void) in the codebase. But there was no particular reason
to change it in this patch.
> 7 - 0005
> ```
> +static void
> +report_recovery_conflict(RecoveryConflictReason reason)
> +{
> + bool fatal;
>
> + if (RECOVERY_CONFLICT_DATABASE)
> + {
> ```
>
> I believe this should be if (reason == RECOVERY_CONFLICT_DATABASE).
Oops yes, fixed!
Attached are new versions of the remaining patches, with the above and
the typo and whitespace fixes that you pointed out.
- Heikki
| Attachment | Content-Type | Size |
|---|---|---|
| v2-0001-Use-ProcNumber-rather-than-pid-in-ReplicationSlot.patch | text/x-patch | 11.1 KB |
| v2-0002-Separate-RecoveryConflictReasons-from-procsignals.patch | text/x-patch | 37.9 KB |
| v2-0003-Refactor-ProcessRecoveryConflictInterrupt-for-rea.patch | text/x-patch | 15.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bertrand Drouvot | 2026-02-03 13:39:22 | Re: rename and move AssertVariableIsOfType |
| Previous Message | cca5507 | 2026-02-03 13:30:15 | Re: Batching in executor |