Re: straightening out backend process startup

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: andres(at)anarazel(dot)de
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: straightening out backend process startup
Date: 2021-08-03 07:50:24
Message-ID: 20210803.165024.2159487945907548407.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Mon, 2 Aug 2021 09:41:24 -0700, Andres Freund <andres(at)anarazel(dot)de> wrote in
> Hi,
>
> I've previously complained ([1]) that process initialization has gotten
> very complicated. I hit this once more last week when trying to commit
> one of the shared memory stats pieces...
>
> I think there's quite a few different issues around this - here I'm just
> trying to tackle a few of the most glaring (to me):
>
> - AuxiliaryProcessMain() is used for two independent tasks: Start bootstrap /
> checker mode and starting auxiliary processes. In HEAD there's maybe 5 lines
> out 250 that are actually common to both uses.
>
> A related oddity is that we reserve shared memory resources for bootstrap &
> checker aux processes, despite those never existing.
>
> This is addressed in patches 1-7
>
> - The order of invocation of InitProcess()/InitAuxiliaryProcess() and
> BaseInit() depends on EXEC_BACKEND. Due to that there often is no single
> place initialization code can be put if it needs any locks.
>
> This is addressed in patches 8-9
>
> - PostgresMain() has code for single user and multi user interleaved, making
> it unnecessarily hard to understand what's going on.
>
> This is addressed in patches 10
>
>
> This isn't a patch series ready to commit, there's a bunch of polishing that
> needs to be done if there's agreement.
>
>
> Questions I have:
>
> - What exactly to do with checker mode: Keep it as part of bootstrap, separate
> it out completely? What commandline flags?

Checker tries to attach shared memory just to make sure it is actually
attachable with a set of parameters. It is similar to bootstrap as it
is not run under postmaster but similar to auxiliary process as it
needs to attach shared memory. If we are going to get rid of
shared-memory access by bootstrap, or get rid of the bootstrap itself,
checker should be separated out from bootstrap.

> - I used a separate argv entry to pass the aux proc type - do we rather want
> to go for the approach that e.g. bgworker went for? Adding code for string
> splitting seems a bit unnecessary to me.

It seems to me separate entry is cleaner and robust.

> - PostgresMainSingle() should probably not be in postgres.c. We could put it
> into postinit.c or ..?

PostgresMainSingle() looks like the single-process version of
PostgresMain so it is natural that they are placed together in
postgres.c. If PostgresMainSingle is constructed as initializing
standalone first then calling PostgresMain, it might be right that
PostgresMain calls the initialization function resides in postinit.c
if !IsUnderPostmaster.

PostgresMain()
{
if (!IsUnderPostmaster)
InitSinglePostgres(argv[0]);
...

> - My first attempt at PostgresMainSingle() separated the single/multi user
> cases a bit more than the code right now, by having a PostgresMainCommon()
> which was called by PostgresMainSingle(), PostgresMain(). *Common only
> started with the MessageContext allocation, which did have the advantage of
> splitting out a few of the remaining conditional actions in PostgresMain()
> (PostmasterContext, welcome banner, Log_disconnections). But lead to a bit
> more duplication. I don't really have an opinion on what's better.

I'm not sure how it looked like, but isn't it reasonable that quickdie
and log_disconnections(). handle IsUnderPostmaster instead? Or for
log_disconnections, Log_disconnections should be turned off at
standalone-initialization?

> - I had to move the PgStartTime computation to a bit earlier for single user
> mode. That seems to make sense to me anyway, given that postmaster does so
> fairly early too.
>
> Any reason that'd be a bad idea?
>
> Arguably it should even be a tad earlier to be symmetric.

Why don't you move the code for multiuser as earlier as standalone does?

> There's one further issue that I think is big enough to be worth
> tackling in the near future: Too many things depend on BackendIds.
>
> Aux processes need procsignal and backend status reporting, which use
> BackendId for indexing. But they don't use sinval, so they don't have a
> BackendId - so we have hacks to work around that in a few places. If we
> instead make those places use pgprocno for indexing the whole issue
> vanishes.
>
> In fact, I think there's a good argument to be made that we should
> entirely remove the concept of BackendIds and just use pgprocnos. We
> have a fair number of places like SignalVirtualTransaction() that need
> to search the procarray just to find the proc to signal based on the
> BackendId. If we used pgprocno instead, that'd not be needed.
>
> But perhaps that's a separate thread.
>
> Greetings,
>
> Andres Freund
>
> [1] https://postgr.es/m/20210402002240.56cuz3uo3alnqwae%40alap3.anarazel.de

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-08-03 07:56:12 Re: Remove unused 'len' from pg_stat_recv_* functions
Previous Message houzj.fnst@fujitsu.com 2021-08-03 07:40:22 RE: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)