Re: parallelizing the archiver

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: "Bossart, Nathan" <bossartn(at)amazon(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallelizing the archiver
Date: 2021-10-19 16:12:00
Message-ID: CA+TgmoaMLCB2dRdfHf7DuB-cE=K+=6jmght6sLZ61MrP-AB+CA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 19, 2021 at 10:19 AM Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> But, is logical decoding really that great an example? I mean, we build pgoutput.so as a library, we don't provide it compiled-in. So we could build the "shell archiver" based on that pattern, in which case we should create a postmaster/shell_archiver directory or something like that?

Well, I guess you could also use parallel contexts as an example.
There, the core facilities that most people will use are baked into
the server, but you can provide your own in an extension and the
parallel context stuff will happily call it for you if you so request.

I don't think the details here are too important. I'm just saying that
not everything needs to depend on _PG_init() as a way of bootstrapping
itself. TBH, if I ran the zoo and also had infinite time to tinker
with stuff like this, I'd probably make a pass through the hooks we
already have and try to refactor as many of them as possible to use
some mechanism other than _PG_init() to bootstrap themselves. That
mechanism actually sucks. When we use other mechanisms -- like a
language "C" function that knows the shared object name and function
name -- then load is triggered when it's needed, and the user gets the
behavior they want. Similarly with logical decoding and FDWs -- you,
as the user, say that you want this or that kind of logical decoding
or FDW or C function or whatever -- and then the system either notices
that it's already loaded and does what you want, or notices that it's
not loaded and loads it, and then does what you want.

But when the bootstrapping mechanism is _PG_init(), then the user has
got to make sure the library is loaded at the correct time. They have
to know whether it should go into shared_preload_libraries or whether
it should be put into one of the other various GUCs or if it can be
loaded on the fly with LOAD. If they don't load it in the right way,
or if it doesn't get loaded at all, well then probably it just
silently doesn't work. Plus there can be weird cases if it gets loaded
into some backends but not others and things like that.

And here we seem to have an opportunity to improve the interface by
not depending on it.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Japin Li 2021-10-19 16:34:18 Unexpected behavior of updating domain array that is based on a composite
Previous Message vignesh C 2021-10-19 16:11:45 Re: Added schema level support for publication.