Re: parallelizing the archiver

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallelizing the archiver
Date: 2021-10-22 14:42:01
Message-ID: CABUevEyqLAwRbWOb7_cSrpRZPknF6t4H5Oq7CPdkABp5ECw0CA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 21, 2021 at 9:51 PM Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:

> On 10/20/21, 3:23 PM, "Bossart, Nathan" <bossartn(at)amazon(dot)com> wrote:
> > Alright, I reworked the patch a bit to maintain backward
> > compatibility. My initial intent for 0001 was to just do a clean
> > refactor to move the shell archiving stuff to its own file. However,
> > after I did that, I realized that adding the hook wouldn't be too much
> > more work, so I did that as well. This seems to be enough to support
> > custom archiving modules. I included a basic example of such a module
> > in 0002. 0002 is included primarily for demonstration purposes.
>
> It looks like the FreeBSD build is failing because sys/wait.h is
> missing. Here is an attempt at fixing that.
>

I still like the idea of loading the library via a special parameter,
archive_library or such.

One reason for that is that adding/removing modules in
shared_preload_libraries has a terrible UX in that you have to replace the
whole thing. This makes it much more complex to deal with when different
modules just want to add to it.

E.g. my awsome backup program could set
archive_library='my_awesome_backups', and know it didn't break anything
else. but it couldn't set shared_preload_libraries='my_awesome_bacukps',
because then it might break a bunch of other modules that used to be there.
So it has to go try to parse the whole config and figure out where to make
such modifications.

Now, this could *also* be solved by allowing shared_preload_library to be a
"list" instead of a string, and allow postgresql.conf to accept syntax like
shared_preload_libraries+='my_awesome_backups'.

But without that level fo functionality available, I think a separate
parameter for the archive library would be a good thing.

Other than that:
+
+/*
+ * Is WAL archiving configured? For consistency with previous releases,
this
+ * checks that archive_command is set when archiving via shell is enabled.
+ * Otherwise, we just check that an archive function is set, and it is the
+ * responsibility of that archive function to ensure it is properly
configured.
+ */
+#define XLogArchivingConfigured() \
+ (PG_archive && (PG_archive != shell_archive ||
XLogArchiveCommand[0] != '\0'))

Wouldn't that be better as a callback into the module? So that
shell_archive would implement the check for XLogArchiveCommand. Then
another third party module can make it's own decision on what to check. And
PGarchive would then be a struct that holds a function pointer to the
archive command and another function pointer to the isenabled command? (I
think having a struct for it would be useful regardless -- for possible
future extensions with more API points).

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-10-22 14:53:31 Re: Experimenting with hash tables inside pg_dump
Previous Message Magnus Hagander 2021-10-22 14:33:47 Re: parallelizing the archiver