Re: Refactor recovery conflict signaling a little

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Xuneng Zhou <xunengzhou(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
Subject: Re: Refactor recovery conflict signaling a little
Date: 2026-03-07 11:00:01
Message-ID: 3e07149d-060b-48a0-8f94-3d5e4946ae45@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Xuneng and Heikki,

04.03.2026 07:33, Xuneng Zhou wrote:
>> 03.03.2026 17:39, Heikki Linnakangas wrote:
>>> On 24/02/2026 10:00, Alexander Lakhin wrote:
>>>> The "terminating process ..." message doesn't appear when the test passes
>>>> successfully.
>>> Hmm, right, looks like something wrong in signaling the recovery conflict. I can't tell if the signal is being sent,
>>> or it's not processed correctly. Looking at the code, I don't see anything wrong.
>>>
> I was unable to reproduce the issue on an x86_64 Linux machine using
> the provided script. All test runs completed successfully without any
> failures.

I've added debug logging (see attached) and saw the following:
!!!SignalRecoveryConflict[282363]
!!!ProcArrayEndTransaction| pendingRecoveryConflicts = 0
!!!ProcessInterrupts[283863]| MyProc->pendingRecoveryConflicts: 0
!!!ProcessInterrupts[283863]| MyProc->pendingRecoveryConflicts: 0
2026-03-07 12:21:24.544 EET walreceiver[282421] FATAL:  could not receive data from WAL stream: server closed the
connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
2026-03-07 12:21:24.645 EET postmaster[282355] LOG:  received immediate shutdown request
2026-03-07 12:21:24.647 EET postmaster[282355] LOG:  database system is shut down

While for a successful run, I see:
2026-03-07 12:18:17.075 EET startup[285260] DETAIL:  The slot conflicted with xid horizon 677.
2026-03-07 12:18:17.075 EET startup[285260] CONTEXT:  WAL redo at 0/04022130 for Heap2/PRUNE_ON_ACCESS:
snapshotConflictHorizon: 677, isCatalogRel: T, nplans: 0, nredirected: 0, ndead: 2, nunused: 0, dead: [35, 36]; blkref
#0: rel 1663/16384/16418, blk 10
!!!SignalRecoveryConflict[285260]
!!!ProcessInterrupts[286071]| MyProc->pendingRecoveryConflicts: 16
!!!ProcessRecoveryConflictInterrupts[286071]
!!!ProcessRecoveryConflictInterrupts[286071] pending: 16, reason: 4
2026-03-07 12:18:17.075 EET walsender[286071] 035_standby_logical_decoding.pl ERROR:  canceling statement due to
conflict with recovery
2026-03-07 12:18:17.075 EET walsender[286071] 035_standby_logical_decoding.pl DETAIL:  User was using a logical
replication slot that must be invalidated.

(Full logs for this failed run and a good run are attached.)

Best regards,
Alexander

Attachment Content-Type Size
035_debugging.patch text/x-patch 2.7 KB
035_logs.tar.bz2 application/x-bzip2 7.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2026-03-07 13:32:18 Re: [BUG?] missing array index may result in a wrong constraint name (pg_dump, bin-upgrade, >=18)
Previous Message Amit Langote 2026-03-07 09:54:27 Re: generic plans and "initial" pruning