| From: | Antonin Houska <ah(at)cybertec(dot)at> |
|---|---|
| To: | Alvaro Herrera <alvherre(at)kurilemu(dot)de> |
| Cc: | Baji Shaik <baji(dot)pgdev(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: [PATCH] Fix REPACK decoding worker not cleaned up on FATAL exit |
| Date: | 2026-05-20 07:54:06 |
| Message-ID: | 8656.1779263646@localhost |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Alvaro Herrera <alvherre(at)kurilemu(dot)de> wrote:
> On 2026-May-17, Baji Shaik wrote:
>
> > v3 uses PG_ENSURE_ERROR_CLEANUP which:
> > - Handles both ERROR and FATAL exits
> > - Automatically cancels the callback on normal completion
> > (no slot leak)
> > - Runs before memory contexts are destroyed (no use-after-free)
>
> Yeah, looks good. I have pushed it, with some comment wordsmithing and
> other cosmetic changes.
>
> While looking at it, I realized that I didn't like the way
> stop_repack_decoding_worker() works, mainly because if there's no
> handle, we leak everything else -- and the way we initialize things
> means we leak the shared memory segment. This is maybe a rare case and
> just a small memory leak, but it seems better to do it nicely. So
> here's a followup patch that reworks that code. This also forced me to
> understand more clearly what is going on, so I rewrote the comments.
The call of shm_mq_detach() got lost, or do you rely on dsm_detach() to call
shm_mq_detach_callback() ? The latter does not free ->mqh_buffer. Since each
REPACK runs in a separate transaction, I wouldn't consider that a leak, but I
still think that explicit call of shm_mq_detach() makes the code a bit easier
to read (i.e. no need for the developer to check if the detaching happens
automatically).
/*
- * If we could not cancel the current sleep due to ERROR, do that before
- * we detach from the shared memory the condition variable is located in.
- * If we did not, the bgworker ERROR handling code would try and fail
- * badly.
+ * Now detach from our shared memory segment. In error cases there might
+ * still be messages from the worker in the queue, which ProcessInterrupts
+ * would try to read; this is pointless (and causes an assertion failure),
+ * so set the global pointer to NULL to have ProcessRepackMessages ignore
+ * them.
*/
- ConditionVariableCancelSleep();
-
- dsm_detach(decoding_worker->seg);
+ dsmseg = decoding_worker->seg;
pfree(decoding_worker);
decoding_worker = NULL;
+
+ /* We must also cancel the current sleep, if one is still set up */
+ ConditionVariableCancelSleep();
+
+ if (dsmseg != NULL)
+ dsm_detach(dsmseg);
I suppose the reason for the assertion failure was reading from the queue
after the backend had detached from it? Thanks for fixing that.
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Chao Li | 2026-05-20 07:53:38 | Re: Fix pg_stat_wal_receiver to show CONNECTING status |