| From: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
|---|---|
| To: | Michael Paquier <michael(at)paquier(dot)xyz>, Noah Misch <noah(at)leadboat(dot)com> |
| Cc: | ryanzxg(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, Euler Taveira <euler(at)eulerto(dot)com> |
| Subject: | Re: BUG #19093: Behavioral change in walreceiver termination between PostgreSQL 14.17 and 14.18 |
| Date: | 2025-10-30 16:07:01 |
| Message-ID: | CABPTF7XE--Dc97LXj5Xum1U4v5=CQnrVYFbVjWTWu4J7gCNXAg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
Hi,
On Thu, Oct 30, 2025 at 7:36 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Wed, Oct 29, 2025 at 02:36:10PM +0800, Xuneng Zhou wrote:
> > Should we wrap this assertion within #ifdef USE_ASSERT_CHECKING?
> >
> > #ifdef USE_ASSERT_CHECKING
> > {
> > WalRcvData *walrcv = WalRcv;
> > WalRcvState state;
> > SpinLockAcquire(&walrcv->mutex);
> > state = walrcv->walRcvState;
> > SpinLockRelease(&walrcv->mutex);
> > Assert(state != WALRCV_STOPPING);
> > }
> > #endif
>
> Yep. Or provide a wrapper routine that's able to retrieve the state
> of a WAL receiver, then add it in xlogrecovery.c within an Assert().
> It may be better to only do that on HEAD, the back-branches stress me
> a bit when we enforce such new assumptions.. Perhaps Noah has a
> different opinion.
Using a wrapper function seems better here.
> > However, I think Michael’s approach is better — it’s simpler, more
> > straightforward, and captures the buggy behavior well. I plan to
> > incorporate it into the next version of the patch.
>
> Thanks!
>
> One weakness of this approach is that this test would not be able to
> work if the error message is reworded for some reason. On stable
> branches, I doubt that this error message will ever change, but it
> could be updated to something else on HEAD.
>
> Hence, I would suggest one extra improvement for HEAD: let's also add
> a check for a valid case where a WAL receiver is stopped. We have one
> in 040_standby_failover_slots_sync.pl, where primary_conninfo is
> updated in standby1 and its configuration reloaded, so I'd be tempted
> to add a valid check in the server logs of $standby1. One difference
> is that we should grab the offset of the logs before the reload, then
> make sure that the log entry gets created with a wait_for_log(), as
> there may be a delay between the reload and the moment the log entry
> is generated.
Thanks for your suggestion!
I’ve incorporated both tests and the assertion into the patch applied to master.
For earlier stable branches, the test in
040_standby_failover_slots_sync.pl and the assertion for
WAL receiver state checking are not included (if Noah agrees too). Please check.
Best,
Xuneng
| Attachment | Content-Type | Size |
|---|---|---|
| v2-0001-Stable-Fix-unconditional-walreceiver-shutdown-during.patch | application/octet-stream | 4.6 KB |
| v2-0001-Head-Fix-unconditional-walreceiver-shutdown.patch | application/octet-stream | 7.4 KB |
| v2-0001-Version14-Fix-unconditional-walreceiver-shutdown-during.patch | application/octet-stream | 3.7 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Paul A Jungwirth | 2025-10-30 17:20:54 | Re: BUG #19098: Can't create unique gist index, where pg_indexes says that WITHOUT OVERLAPS does exacly that |
| Previous Message | Tom Lane | 2025-10-30 13:48:32 | Re: BUG #19099: Conditional DELETE from partitioned table with non-updatable partition raises internal error |