Re: Refactor recovery conflict signaling a little

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
Subject: Re: Refactor recovery conflict signaling a little
Date: 2026-03-10 03:55:30
Message-ID: CABPTF7UhV+OkAK9AAFERP6+U7=2OMJ0QGtEYYN9j=WhPRZxdQg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Mar 10, 2026 at 1:05 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> On Mon, Mar 9, 2026 at 11:28 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> >
> > On 09/03/2026 17:02, Xuneng Zhou wrote:
> > > Did you use Alexander’s reproducer script? I tried reproducing with a
> > > 1 ms pg_usleep() added to all three functions that clear
> > > MyProc->pendingRecoveryConflicts, but I still couldn’t reproduce the
> > > issue.
> >
> > I used the attached, to be precise. With that it fails every time for
> > me. I'm not sure if the "if (am_walsender)" check is necessary, I added
> > it just to make the test run faster.
> >
> > - Heikki
>
> I was able to reproduce the issue using a wider sleep window as you
> suggested and can confirm that the flag is not cleared after applying
> the patch. Below are two logs—one from a successful run and one from a
> failed run. I'll look further into the patch later on.
>
> failed run:
> startup[1418915] LOG: DBG SignalRecoveryConflict target_pid=1419118
> reason=4 old_mask=0x0 new_mask=0x10
> walsender[1419118] LOG: DBG ProcArrayEndTransaction(no-xid) CLEARING
> pendingRecoveryConflicts=0x10
>
> successful run:
> startup[1433218] LOG: DBG SignalRecoveryConflict target_pid=1433406
> reason=4 old_mask=0x0 new_mask=0x10
> walsender[1433406] LOG: DBG ProcessInterrupts handler fired 1
> time(s), pending=0x10 -- processing
> walsender[1433406] ERROR: canceling statement due to conflict with recovery
>
> --
> Best,
> Xuneng

I ran the script several times after applying the patch, and all tests
passed without deadlocking. LGTM.
One nit: should we separate the comment fix and the
InitAuxiliaryProcess hardening into separate patches?

--
Best,
Xuneng

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2026-03-10 03:58:36 Re: Change checkpoint‑record‑missing PANIC to FATAL
Previous Message Michael Paquier 2026-03-10 03:05:40 Re: Avoid resource leak (src/bin/pg_dump/pg_dumpall.c)