Why does bootstrap and later initdb stages happen via client?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Why does bootstrap and later initdb stages happen via client?
Date: 2021-09-08 19:07:15
Message-ID: 20210908190715.kb34s3f72kpefac5@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

While hacking on AIO I wanted to build the windows portion from linux. That
works surprisingly well with cross-building using --host=x86_64-w64-mingw32 .

What didn't work as well was running things under wine. It turns out that the
server itself works ok, but that initdb hangs because of a bug in wine ([1]),
leading to the bootstrap process hanging while trying to read more input.

Which made me wonder: What is really the point of doing so much setup as part
of initdb? Of course a wine bug isn't a reason to change anything, but I see
other reasons it might be worth thinking about moving more of initdb's logic
into the backend.

There of course is historical raisins for things happening in initdb - the
setup logic didn't use to be C. But now that it is C, it seems a bit absurd to
read bootstrap data in initdb, write the data to a pipe, and then read it
again in the backend. It for sure doesn't make things faster.

If more of initdb happened in the backend, it seems plausible that we could
avoid the restart of the server between bootstrap and the later setup phases -
which likely would result in a decent speedup. And trialing different
max_connection and shared_buffer settings would be a lot faster without
retries.

Besides potential speedups I also think there's architectural reasons to
prefer doing some of initdb's work in the backend - it would allow to avoid
some duplicated infrastructure and avoid leaking subsystem details to one more
place outside the subsystem.

The reason I CCed Peter is that he at some point proposed ([2]) having the
backend initialize itself via a base backup. I think if we generally moved
more of the data directory initialization into the backend that'd probably
architecturally work a bit better.

I'm not planning to work on this in the near future. But I would like to do so
at some point. And it might be worth considering pushing future additions to
initidb to be moved server-side via functions that initdb calls, rather than
having initdb control everything.

Greetings,

Andres Freund

[1] https://bugs.winehq.org/show_bug.cgi?id=51719
[2] https://www.postgresql.org/message-id/61b8d18d-c922-ac99-b990-a31ba63cdcbb%402ndquadrant.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2021-09-08 19:23:41 Re: Schema variables - new implementation for Postgres 15
Previous Message Jacob Champion 2021-09-08 18:51:27 Re: PROXY protocol support