Re: archive modules

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: "Bossart, Nathan" <bossartn(at)amazon(dot)com>, David Steele <david(at)pgmasters(dot)net>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: archive modules
Date: 2022-02-02 18:42:55
Message-ID: CA+TgmobN3L+ddDK_CQitfR_dSSoeXVxkUu=d57xioh-mW8EXoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 31, 2022 at 8:36 PM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> If basic_archive is to be in contrib, we probably want to avoid restarting
> the archiver every time the module ERRORs. I debated trying to add a
> generic exception handler that all archive modules could use, but I suspect
> many will have unique cleanup requirements. Plus, AFAICT restarting the
> archiver isn't terrible, it just causes most of the normal retry logic to
> be skipped.
>
> I also looked into rewriting basic_archive to avoid ERRORs and return false
> for all failures, but this was rather tedious. Instead, I just introduced
> a custom exception handler for basic_archive's archive callback. This
> allows us to ERROR as necessary (which helps ensure that failures show up
> in the server logs), and the archiver can treat it like a normal failure
> and avoid restarting.

I think avoiding ERROR is going to be impractical. Catching it in the
contrib module seems OK. Catching it in the generic code is probably
also possible to do in a reasonable way. Not catching the error also
seems like it would be OK, since we expect errors to be infrequent.
I'm not objecting to anything you did here, but I'm uncertain why
adding basic_archive along shell_archive changes the calculus here in
any way. It just seems like a separate problem.

+ /* Perform logging of memory contexts of this process */
+ if (LogMemoryContextPending)
+ ProcessLogMemoryContextInterrupt();

Any special reason for moving this up higher? Not really an issue, just curious.

+ gettext_noop("This is unused if
\"archive_library\" does not indicate archiving via shell is
enabled.")

This contains a double negative. We could describe it more positively:
This is used only if \"archive_library\" specifies archiving via
shell. But that's actually a little confusing, because the way you've
set it up, archiving via shell can be specified by writing either
archive_library = '' or archive_library = 'shell'. I don't see any
particularly good reason to allow that to be spelled in two ways.
Let's pick one. Then here we can write either:

(a) This is used only if archive_library = 'shell'.
-or-
(b) This is used only if archive_library is not set.

IMHO, either of those would be clearer than what you have right now,
and it would definitely be shorter.

+/*
+ * Callback that gets called to determine if the archive module is
+ * configured.
+ */
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+
+/*
+ * Callback called to archive a single WAL file.
+ */
+typedef bool (*ArchiveFileCB) (const char *file, const char *path);
+
+/*
+ * Called to shutdown an archive module.
+ */
+typedef void (*ArchiveShutdownCB) (void);

I think that this is the wrong amount of comments. One theory is that
the reader should refer to the documentation to understand how these
callbacks work. In that case, having a separate comment for each one
that doesn't really say anything is just taking up space. It would be
better to have one comment for all three lines referring the reader to
the documentation. Alternatively, one could take the position that the
explanation should go into these comments, and then perhaps we don't
even really need documentation. A one-line comment that doesn't really
say anything non-obvious seems like the worst amount.

+ <warning>
+ <para>
+ There are considerable robustness and security risks in using
archive modules
+ because, being written in the <literal>C</literal> language, they
have access
+ to many server resources. Administrators wishing to enable archive modules
+ should exercise extreme caution. Only carefully audited modules should be
+ loaded.
+ </para>
+ </warning>

Maybe I'm just old and jaded, but do we really need this? I know we
have the same thing for background workers, but if anything that seems
like an argument against duplicating it elsewhere. Lots of copies of
essentially identical warnings aren't the way to great documentation;
if we copy this here, we'll probably copy it to more places. And also,
it seems a bit like warning people that they shouldn't give their
complete financial records to total strangers about whom they have no
little or no information. Do tell.

+<programlisting>
+typedef bool (*ArchiveCheckConfiguredCB) (void);
+</programlisting>
+
+ If <literal>true</literal> is returned, the server will proceed with
+ archiving the file by calling the <function>archive_file_cb</function>
+ callback. If <literal>false</literal> is returned, archiving will not
+ proceed. In the latter case, the server will periodically call this
+ function, and archiving will proceed if it eventually returns
+ <literal>true</literal>.

It's not obvious from reading this why anyone would want to provide
this callback, or have it do anything other than 'return true'. But
there actually is a behavior difference if you provide this and have
it return false, vs. just having archiving itself fail. At least, the
message "archive_mode enabled, yet archiving is not configured" will
be emitted. So that's something we could mention here.

I would suggest s/if it eventually/only when it/

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-02-02 18:44:18 Re: Server-side base backup: why superuser, not pg_write_server_files?
Previous Message John Naylor 2022-02-02 18:40:09 Re: A qsort template