Re: assertion at postmaster start

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: assertion at postmaster start
Date: 2019-08-24 20:55:46
Message-ID: 27707.1566680146@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> I think what this demonstrates is that that Assert is just wrong:
> we *can* reach PM_RUN with the flag still set, so we should do
> StartupStatus = STARTUP_NOT_RUNNING;
> FatalError = false;
> - Assert(AbortStartTime == 0);
> + AbortStartTime = 0;
> ReachedNormalRunning = true;
> pmState = PM_RUN;
> Probably likewise for the similar Assert in sigusr1_handler.

Poking further at this, I noticed that the code just above here completely
fails to do what the comments say it intends to do, which is restart the
startup process after we've SIGQUIT'd it. That's because the careful
manipulation of StartupStatus in reaper (lines 2943ff in HEAD) is stomped
on by HandleChildCrash, which will just unconditionally reset it to
STARTUP_CRASHED (line 3507). So we end up shutting down the database
after all, which is not what the intention seems to be. Hence,
commit 45811be94 was still a few bricks shy of a load :-(.

I propose the attached. I'm inclined to think that the risk/benefit
of back-patching this is not very good, so I just want to stick it in
HEAD, unless somebody can explain why dead_end children are likely to
crash in the field.

regards, tom lane

Attachment Content-Type Size
handle-dead-end-child-crash-better-1.patch text/x-diff 2.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-08-24 20:57:56 Re: LLVM breakage on seawasp
Previous Message Andres Freund 2019-08-24 20:24:09 Re: LLVM breakage on seawasp