| From: | Baji Shaik <baji(dot)pgdev(at)gmail(dot)com> |
|---|---|
| To: | Alvaro Herrera <alvherre(at)kurilemu(dot)de> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: [PATCH] Fix REPACK decoding worker not cleaned up on FATAL exit |
| Date: | 2026-05-18 00:34:20 |
| Message-ID: | CA+fm-RN2oL=mWXA6NLKNS5wrCXLML-z+O6euKW1ga2ZEYFVQ0Q@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Alvaro, Sami,
Thank you both for the feedback.
Sami, thanks for identifying the use-after-free with on_proc_exit
and providing the v2 patch with before_shmem_exit. That helped
narrow down the right approach.
On Sun, May 17, 2026 at 3:46 PM Alvaro Herrera <alvherre(at)kurilemu(dot)de> wrote:
> I think a better fix for this is to use PG_ENSURE_ERROR_CLEANUP(). That
> way we avoid leaving callbacks in place, which would not be great if the
> same backend does a lot of REPACKs: after a dozen or so, it dies with
>
> FATAL: out of before_shmem_exit slots
>
>
Alvaro, you're right that PG_ENSURE_ERROR_CLEANUP is the better
approach here. Attached is v3 which uses it instead of
before_shmem_exit or on_proc_exit.
Summary of the issues with v1 and v2:
- v1 (on_proc_exit): Crashes with assertions enabled because
memory contexts are already destroyed by the time on_proc_exit
callbacks run. The worker handle is clobbered with 0x7f
(CLOBBER_FREED_MEMORY).
- v2 (before_shmem_exit): Works correctly for a single REPACK,
but leaks a callback slot on each successful REPACK
(CONCURRENTLY). After ~15 REPACKs in the same session:
FATAL: out of before_shmem_exit slots.
v3 uses PG_ENSURE_ERROR_CLEANUP which:
- Handles both ERROR and FATAL exits
- Automatically cancels the callback on normal completion
(no slot leak)
- Runs before memory contexts are destroyed (no use-after-free)
The existing PG_TRY/PG_FINALLY block is replaced with
PG_ENSURE_ERROR_CLEANUP for the concurrent path, and a plain
rebuild_relation() call for the non-concurrent path.
Tested with --enable-cassert --enable-debug --enable-injection-points:
- All 245 regression tests pass (including cluster)
- All 8 injection point tests pass (including repack and repack_toast)
- pg_terminate_backend cleans up worker and slot without crash
- 20 REPACK (CONCURRENTLY) in same session completes without
slot exhaustion
I have not added a dedicated regression test for the
pg_terminate_backend scenario yet, but I can write one using
injection points if needed.
Thanks,
Baji Shaik
| Attachment | Content-Type | Size |
|---|---|---|
| v3-0001-Fix-REPACK-decoding-worker-not-cleaned-up-on-FATAL-exit.patch | application/octet-stream | 3.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Junwang Zhao | 2026-05-18 02:07:02 | Re: (SQL/PGQ) cache lookup failed for label |
| Previous Message | Ashutosh Bapat | 2026-05-18 00:22:19 | Re: (SQL/PGQ) cache lookup failed for label |