Re: [PATCH] Fix REPACK decoding worker not cleaned up on FATAL exit

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Alvaro Herrera <alvherre(at)kurilemu(dot)de>
Cc: Baji Shaik <baji(dot)pgdev(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] Fix REPACK decoding worker not cleaned up on FATAL exit
Date: 2026-05-20 07:54:06
Message-ID: 8656.1779263646@localhost
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)kurilemu(dot)de> wrote:

> On 2026-May-17, Baji Shaik wrote:
>
> > v3 uses PG_ENSURE_ERROR_CLEANUP which:
> > - Handles both ERROR and FATAL exits
> > - Automatically cancels the callback on normal completion
> > (no slot leak)
> > - Runs before memory contexts are destroyed (no use-after-free)
>
> Yeah, looks good. I have pushed it, with some comment wordsmithing and
> other cosmetic changes.
>
> While looking at it, I realized that I didn't like the way
> stop_repack_decoding_worker() works, mainly because if there's no
> handle, we leak everything else -- and the way we initialize things
> means we leak the shared memory segment. This is maybe a rare case and
> just a small memory leak, but it seems better to do it nicely. So
> here's a followup patch that reworks that code. This also forced me to
> understand more clearly what is going on, so I rewrote the comments.

The call of shm_mq_detach() got lost, or do you rely on dsm_detach() to call
shm_mq_detach_callback() ? The latter does not free ->mqh_buffer. Since each
REPACK runs in a separate transaction, I wouldn't consider that a leak, but I
still think that explicit call of shm_mq_detach() makes the code a bit easier
to read (i.e. no need for the developer to check if the detaching happens
automatically).

/*
- * If we could not cancel the current sleep due to ERROR, do that before
- * we detach from the shared memory the condition variable is located in.
- * If we did not, the bgworker ERROR handling code would try and fail
- * badly.
+ * Now detach from our shared memory segment. In error cases there might
+ * still be messages from the worker in the queue, which ProcessInterrupts
+ * would try to read; this is pointless (and causes an assertion failure),
+ * so set the global pointer to NULL to have ProcessRepackMessages ignore
+ * them.
*/
- ConditionVariableCancelSleep();
-
- dsm_detach(decoding_worker->seg);
+ dsmseg = decoding_worker->seg;
pfree(decoding_worker);
decoding_worker = NULL;
+
+ /* We must also cancel the current sleep, if one is still set up */
+ ConditionVariableCancelSleep();
+
+ if (dsmseg != NULL)
+ dsm_detach(dsmseg);

I suppose the reason for the assertion failure was reading from the queue
after the backend had detached from it? Thanks for fixing that.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message Chao Li 2026-05-20 07:53:38 Re: Fix pg_stat_wal_receiver to show CONNECTING status