Inconsistency in startup process's MyBackendId and procsignal array registration with ProcSignalInit()

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Inconsistency in startup process's MyBackendId and procsignal array registration with ProcSignalInit()
Date: 2021-10-09 13:22:16
Message-ID: CALj2ACU7x4Ze+nVObGFMZpcE6TVN0LQtcD320UiBpEne7e1SVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

It looks like auxiliary processes will not have a valid MyBackendId as
they don't call InitPostgres() and SharedInvalBackendInit() unlike
backends. But the startup process (which is an auxiliary process) in
hot standby mode seems to be different in the way that it does have a
valid MyBackendId as SharedInvalBackendInit() gets called from
InitRecoveryTransactionEnvironment(). The SharedInvalBackendInit()
usually stores the MyBackendId in the caller's PGPROC structure i.e.
MyProc->backendId. The auxiliary processes (including the startup
process) usually register themselves in procsignal array with
ProcSignalInit(MaxBackends + MyAuxProcType + 1) unlike the backends
with ProcSignalInit(MyBackendId) after SharedInvalBackendInit() in
InitPostgres().

The problem comes when a postgres process wants to send a multiplexed
SIGUSR1 signal (probably using SendProcSignal()) to the startup
process after receiving its ProcSignal->psh_slot[] with its backendId
from the PGPROC (the postgres process can get the startup process
PGPROC structure from AuxiliaryPidGetProc()). Remember the startup
process has registered in the procsignal array with
ProcSignalInit(MaxBackends + MyAuxProcType + 1), not with the
ProcSignalInit(MyBackendId) like the backends did. So, the postgres
process, wanting to send SIGUSR1 to the startup process, refers to the
wrong ProcSignal->psh_slot[] and may not send the signal.

Is this inconsistency of MyBackendId for a startup process a problem
at all? Thoughts?

These are the following ways I think we can fix it, if at all some
other hacker agrees that it is actually an issue:

1) Fix the startup process code, probably by unregistering the
procsignal array entry that was made with ProcSignalInit(MaxBackends +
MyAuxProcType + 1) in AuxiliaryProcessMain() and register with
ProcSignalInit(MyBackendId) immediately after SharedInvalBackendInit()
calculates the MyBackendId in with SharedInvalBackendInit() in
InitRecoveryTransactionEnvironment(). This seems risky to me as
unregistering and registering ProcSignal array involves some barriers
and during the registering and unregistering window, the startup
process may miss the SIGUSR1.

2) Ensure that the process, that wants to send the startup process
SIGUSR1 signal, doesn't use the backendId from the startup process
PGPROC, in which case it has to loop over all the entries of
ProcSignal->psh_slot[] array to find the entry with the startup
process PID. It seems easier and less riskier but only caveat is that
the sending process shouldn't look at the backendId from auxiliary
process PGPROC, instead it should just traverse the entire proc signal
array to find the right slot.

3) Add a comment around AuxiliaryPidGetProc() that says "auxiliary
processes don't have valid backend ids, so don't use the backendId
from the returned PGPROC".

(2) and (3) seem reasonable to me. Thoughts?

Regards,
Bharath Rupireddy.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-10-09 13:23:56 enhance pg_log_backend_memory_contexts() to log memory contexts of auxiliary processes
Previous Message Matthias van de Meent 2021-10-09 12:49:56 Re: RFC: compression dictionaries for JSONB