| From: | Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com> |
|---|---|
| To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
| Cc: | pgsql-hackers(at)postgresql(dot)org, "noah(at)leadboat(dot)com" <noah(at)leadboat(dot)com> |
| Subject: | Re: [PATCH] postmaster: fix stale PM_STARTUP comment |
| Date: | 2026-04-17 13:47:18 |
| Message-ID: | CAJTYsWX=EkwTvA15=T8dE1c4iVoapTT5=nkrByd2h-18NU8hZA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On Wed, 15 Apr 2026 at 23:31, Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com>
wrote:
> Hi
>
> On Wed, 15 Apr 2026 at 22:21, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
>> On 15/04/2026 16:57, Ayush Tiwari wrote:
>> > Hi,
>> >
>> > The comment above the PM_STARTUP startup-process-failure case still says
>> > that there are no other processes running yet, so the postmaster can
>> just
>> > exit.
>> >
>> > That no longer matches the current startup flow: PM_STARTUP may already
>> > have auxiliary processes running by that point. The attached patch
>> updates
>> > that comment to describe the current behavior.
>>
>> Hmm, shouldn't the postmaster kill and wait for the auxiliary processes
>> to exit first in that case? ISTM we need code changes here, not just
>> comments.
>>
>> - Heikki
>>
>>
> Yes, I agree, code change is required here.
>
> The proper thing is to
> route this through the existing crash-handling path so the postmaster
> SIGQUITs the aux children and waits for them to exit before terminating.
>
> I think the minimal change is:
>
> 1. Replace the ExitPostmaster(1) shortcut in the PM_STARTUP
> startup-failure case with HandleChildCrash(), which calls
> TerminateChildren(SIGQUIT) and transitions through the state
> machine. Set StartupStatus = STARTUP_CRASHED so the state
> machine does not try to reinitialize.
>
> 2. Let HandleFatalError() handle PM_STARTUP by transitioning to
> PM_WAIT_BACKENDS, instead of the current Assert(false).
>
>
The minimal fix turned out to be smaller than I first described, the
existing paragraph immediately below the ExitPostmaster(1) block already
handles !EXIT_STATUS_0 with StartupStatus != STARTUP_SIGNALED correctly
(sets STARTUP_CRASHED and HandleChildCrash). So, likely fix would be:
1. Deleting the PM_STARTUP ExitPostmaster(1) shortcut, and letting
execution fall through to the next stanza.
2. Replacing the Assert(false) for PM_STARTUP in HandleFatalError() with a
fall-through to UpdatePMState(PM_WAIT_BACKENDS).
Verification that I did for patch:
On a fresh initdb'd cluster, I zeroed out the first WAL segment to force
the startup process to FATAL at StartupXLOG, then ran PG in foreground
under strace.
Before (master):
LOG: startup process (PID N) exited with exit code 1
LOG: aborting startup due to startup process failure
LOG: database system is shut down
strace of the postmaster PID shows 0 kill() calls to children before
exit_group(1). Checkpointer, bgwriter and io workers were running at
the time of the failure and were orphaned.
After (patched):
LOG: startup process (PID N) exited with exit code 1
LOG: terminating any other active server processes
<state transitions PM_STARTUP -> PM_WAIT_BACKENDS -> PM_WAIT_DEAD_END
-> PM_NO_CHILDREN>
LOG: shutting down due to startup process failure
LOG: database system is shut down
strace shows 8 SIGQUIT deliveries (4 children, each signaled by PID
and by process-group) before the postmaster's own exit_group(1).
I've attached a patch, please review and let me know your thoughts.
Regards,
Ayush
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-postmaster-drain-aux-processes-on-startup-process-fa.patch | application/octet-stream | 3.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | vignesh C | 2026-04-17 13:55:42 | Re: Fix tab completion after EXCEPT (...) in IMPORT FOREIGN SCHEMA |
| Previous Message | Tatsuo Ishii | 2026-04-17 13:47:08 | Re: Row pattern recognition |