Re: add assertion for palloc in signal handlers

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Kirill Reshke <reshkekirill(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: add assertion for palloc in signal handlers
Date: 2026-02-19 09:25:43
Message-ID: b3914af0-1e94-4b7a-941f-eed0c7fd5fc5@iki.fi
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 19/02/2026 09:21, Kirill Reshke wrote:
> On Thu, 19 Feb 2026 at 02:50, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> I don't think die() should call ereport() without being in single user mode. I
>> guess there's a corner case though, which is that the signal handler is
>> executed during exit processing, when we already reset whereToSendOutput. I
>> think we probably should make sure this stuff is only reached in actual single
>> user mode.

+1

> Hi! Thank you for your input.
> Stupid question from me: in singleuser mode, how should this work? We
> still should not ereport inside signal handlers, are we?
> If singleuser mode is somehow safe with die & ereport then... I don't
> understand how.

It's not safe to ereport() from die() in single-user mode either. We've
just accepted it as the lesser evil:

> /*
> * If we're in single user mode, we want to quit immediately - we can't
> * rely on interrupts as they wouldn't work when stdin/stdout is a file.
> * Rather ugly, but it's unlikely to be worthwhile to invest much more
> * effort just for the benefit of single user mode.
> */
> if (DoingCommandRead && whereToSendOutput != DestRemote)
> ProcessTerminateInterrupt();

I don't quite understand that though. I understand that a read/write on
a file might be uninterruptible. But so what? It presumably won't take
very long, so it's fine to wait for the read/write to finish. Are we
worried about getting stuck on an NFS read when the network is down, or
something like that?

This was the discussion back then
[https://www.postgresql.org/message-id/20150203202811.GA12566%40awork2.anarazel.de]

>> Could you use WaitLatchOrSocket to sleep on stdin?
>
> Unfortunately I don't think so. poll() etc only works properly on
> network handles, pipes etc - but stdin can be a file :(. And I think
> what exactly happens if it's a file fd isn't super well defined. On
> linux the file is always marked as ready, which would probably actually
> work...

I think that's actually wrong. POSIX says
[https://pubs.opengroup.org/onlinepubs/9799919799/functions/poll.html]

> The poll() and ppoll() functions shall support regular files,
> terminal and pseudo-terminal devices, FIFOs, pipes, and sockets. The
> behavior of poll() and ppoll() on elements of fds that refer to
> other types of file is unspecified.
>
> Regular files shall always poll TRUE for reading and writing.

So that is pretty well-defined, and we could use poll() on stdin too.

>> I'm somewhat confused by
>>
>> /* Don't joggle the elbow of proc_exit */
>> if (!proc_exit_inprogress)
>> {
>> InterruptPending = true;
>> ProcDiePending = true;
>> }
>>
>> because the only thing that avoids is actually harmless stuff, we're not going
>> to notice that , whereas the ProcessInterrupts() done further down is actually
>> certainly "joggling the elbow of proc_exit().
>>
>> <digs>
>>
>> I think the comment hasn't quite kept up with the time, the block used to do
>> more. Some dude named Andres didn't remove the comment in commit 2505ce0be0b.
>>
>> And another dude, confusingly also named Andres, in commit 4f85fde8eb86,
>> didn't think about the scenario that whereToSendOutput could already be reset
>> to DestNone in die()
>>
>> Ooops.

Yeah, I've been wondering about that "joggle the elbow" comment too. We
don't have that check in any other signal handlers that set
InterruptPending. I believe we could just remove the if(...) and set
InterruptPending and ProcDiePending unconditionally here.

>> quickdie() obviously does does reach the ereport(). I think it's a bad idea,
>> with a bad justification: For one, libpq makes up the same error reason if the
>> client just vanishes. For another, because it just uses
>> WARNING[_CLIENT_ONLY], it'll not even reach clients if they use a higher
>> client_min_messages.
>>
>> Just hoping that the client communication, including openssl, is in a state
>> that makes it somewhat safe to send messages from a signal handler, is
>> bonkers, IMNSHO. Yes, we have the "safety" mechanism of postmaster SIGKILLing
>> processes, but that only protects against self deadlocks, not against
>> corrupted datastructures etc.

PqCommBusy gives some protection from messing with openssl state. But
yes, it's generally unsafe.

>> It's one thing to ereport() in a signal handler during a crash triggered
>> quickdie(), the shit already has hit the fan after all, but doing it as part
>> of an intended immediate shutdown is a seriously bad idea.

I don't see much difference between a crash and an "intended immediate
shutdown". In both cases, you don't want to corrupt datastructures etc.
more than you already have, but on the other hand the system is going
down anyway.

> So, after all, we need to remove ereport from quick die completely, no
> backup plan?
> This will make whole system less verbose about its shutdown reasons...
> Not sure how vital that is

What's the proposed alternative to ereporting() from quickdie? If we
just remove the ereport(), yeah, clients will miss out on the messages.
Are we OK with that?

One alternative is to add more checks to quickdie(), and refrain from
ereporting() if we're in the middle of an existing ereport(), for
example, or some other such state. We probably cannot make it 100% safe
with that approach, but maybe that's the right tradeoff.

Another alternative is to try to handle SIGQUIT more like SIGTERM, and
delay the exiting until next CHECK_FOR_INTERRUPTS(). Maybe with a short
timer to just SIGKILL if it the interrupt isn't processed quickly. It
increases the risk that the process won't exit as quickly as we'd like,
but maybe that's the right tradeoff.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2026-02-19 09:31:22 Re: SQL Property Graph Queries (SQL/PGQ)
Previous Message Amit Langote 2026-02-19 09:21:27 Re: Eliminating SPI / SQL from some RI triggers - take 3