Re: Inconsistency in startup process's MyBackendId and procsignal array registration with ProcSignalInit()

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: bharath(dot)rupireddyforpostgres(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Inconsistency in startup process's MyBackendId and procsignal array registration with ProcSignalInit()
Date: 2021-10-11 07:11:02
Message-ID: 20211011.161102.873686464224461228.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Mon, 11 Oct 2021 15:24:46 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>
>
> On 2021/10/09 22:22, Bharath Rupireddy wrote:
> > Hi,
> > It looks like auxiliary processes will not have a valid MyBackendId as
> > they don't call InitPostgres() and SharedInvalBackendInit() unlike
> > backends. But the startup process (which is an auxiliary process) in
> > hot standby mode seems to be different in the way that it does have a
> > valid MyBackendId as SharedInvalBackendInit() gets called from
> > InitRecoveryTransactionEnvironment(). The SharedInvalBackendInit()
> > usually stores the MyBackendId in the caller's PGPROC structure i.e.
> > MyProc->backendId. The auxiliary processes (including the startup
> > process) usually register themselves in procsignal array with
> > ProcSignalInit(MaxBackends + MyAuxProcType + 1) unlike the backends
> > with ProcSignalInit(MyBackendId) after SharedInvalBackendInit() in
> > InitPostgres().
> > The problem comes when a postgres process wants to send a multiplexed
> > SIGUSR1 signal (probably using SendProcSignal()) to the startup
> > process after receiving its ProcSignal->psh_slot[] with its backendId
> > from the PGPROC (the postgres process can get the startup process
> > PGPROC structure from AuxiliaryPidGetProc()). Remember the startup
> > process has registered in the procsignal array with
> > ProcSignalInit(MaxBackends + MyAuxProcType + 1), not with the
> > ProcSignalInit(MyBackendId) like the backends did. So, the postgres
> > process, wanting to send SIGUSR1 to the startup process, refers to the
> > wrong ProcSignal->psh_slot[] and may not send the signal.
> > Is this inconsistency of MyBackendId for a startup process a problem
> > at all? Thoughts?
> > These are the following ways I think we can fix it, if at all some
> > other hacker agrees that it is actually an issue:
> > 1) Fix the startup process code, probably by unregistering the
> > procsignal array entry that was made with ProcSignalInit(MaxBackends +
> > MyAuxProcType + 1) in AuxiliaryProcessMain() and register with
> > ProcSignalInit(MyBackendId) immediately after SharedInvalBackendInit()
> > calculates the MyBackendId in with SharedInvalBackendInit() in
> > InitRecoveryTransactionEnvironment(). This seems risky to me as
> > unregistering and registering ProcSignal array involves some barriers
> > and during the registering and unregistering window, the startup
> > process may miss the SIGUSR1.
> > 2) Ensure that the process, that wants to send the startup process
> > SIGUSR1 signal, doesn't use the backendId from the startup process
> > PGPROC, in which case it has to loop over all the entries of
> > ProcSignal->psh_slot[] array to find the entry with the startup
> > process PID. It seems easier and less riskier but only caveat is that
> > the sending process shouldn't look at the backendId from auxiliary
> > process PGPROC, instead it should just traverse the entire proc signal
> > array to find the right slot.
> > 3) Add a comment around AuxiliaryPidGetProc() that says "auxiliary
> > processes don't have valid backend ids, so don't use the backendId
> > from the returned PGPROC".
> > (2) and (3) seem reasonable to me. Thoughts?

(I'm not sure how the trouble happens.)
2 and 3 looks like fixing inconsistency by another inconsistency.
I'm not sure 1 is acceptable.

> How about modifying SharedInvalBackendInit() so that it accepts
> BackendId as an argument and allocates the ProcState entry of
> the specified BackendId? That is, the startup process determines
> that its BackendId is "MaxBackends + MyAuxProcType (=StartupProcess) +
> 1"
> in AuxiliaryProcessMain(), and then it passes that BackendId to
> SharedInvalBackendInit() in InitRecoveryTransactionEnvironment().
>
> Maybe you need to enlarge ProcState array so that it also handles
> auxiliary processes if it does not for now.

It seems to me that the backendId on startup process is used only for
vxid generation. Actually "make check-world" doesn't fail by skipping
SharedInvalBackendinit() (and disabling an assertion).

I thought that we could decouple vxid from backend ID (or sinval code)
by using pgprocno for vxid generation instead of backend ID. "make
check-world" doesn't fail with that change, too. (I don't think
"doesn't fail" ncecessarily mean that that change is correct, though),
but vxid gets somewhat odd after the change..

=# select distinct virtualxid from pg_locks;
virtualxid
------------

116/1 # startup
99/48 # backend 1
...

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2021-10-11 07:20:05 RE: Added schema level support for publication.
Previous Message vignesh C 2021-10-11 07:05:28 Re: Added schema level support for publication.