common signal handler protection

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: andres(at)anarazel(dot)de, noah(at)leadboat(dot)com
Subject: common signal handler protection
Date: 2023-11-21 21:20:08
Message-ID: 20231121212008.GA3742740@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In commit 97550c0, I added a "MyProcPid == getpid()" check in the SIGTERM
handler for the startup process. This ensures that processes forked by
system(3) (i.e., for restore_command) that have yet to install their own
signal handlers do not call proc_exit() upon receiving SIGTERM. Without
this protection, both the startup process and the restore_command process
might try to remove themselves from the PGPROC shared array (among other
things), which can end badly.

Since then, I've been exploring a more general approach that would offer
protection against similar issues in the future. We probably don't want
signal handlers in these grandchild processes to touch shared memory at
all. The attached 0001 is an attempt at adding such protection for all
handlers installed via pqsignal(). In short, it stores the actual handler
functions in a separate array, and sigaction() is given a wrapper function
that performs the "MyProcPid == getpid()" check. If that check fails, the
wrapper function installs the default signal handler and calls it.

Besides allowing us to revert commit 97550c0 (see attached 0003), this
wrapper handler could also restore errno, as shown in 0002. Right now,
individual signal handlers must do this today as needed, but that seems
easy to miss and prone to going unnoticed for a long time.

I see two main downsides of this proposal:

* Overhead: The wrapper handler calls a function pointer and getpid(),
which AFAICT is a real system call on most platforms. That might not be
a tremendous amount of overhead, but it's not zero, either. I'm
particularly worried about signal-heavy code like synchronous
replication. (Are there other areas that should be tested?) If this is
a concern, perhaps we could allow certain processes to opt out of this
wrapper handler, provided we believe it is unlikely to fork or that the
handler code is safe to run in grandchild processes.

* Race conditions: With these patches, pqsignal() becomes quite racy when
used within signal handlers. Specifically, you might get a bogus return
value. However, there are no in-tree callers of pqsignal() that look at
the return value (and I don't see any reason they should), and it seems
unlikely that pqsignal() will be used within signal handlers frequently,
so this might not be a deal-breaker. I did consider trying to convert
pqsignal() into a void function, but IIUC that would require an SONAME
bump. For now, I've just documented the bogosity of the return values.

Thoughts?

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v1-0001-Check-that-MyProcPid-getpid-in-all-signal-handler.patch text/x-diff 4.0 KB
v1-0002-Centralize-logic-for-restoring-errno-in-signal-ha.patch text/x-diff 10.2 KB
v1-0003-Revert-Avoid-calling-proc_exit-in-processes-forke.patch text/x-diff 4.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2023-11-21 21:41:26 Re: Remove distprep
Previous Message Daniel Gustafsson 2023-11-21 21:10:22 Re: proposal: possibility to read dumped table's name from file