Re: archive modules

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "Bossart, Nathan" <bossartn(at)amazon(dot)com>, David Steele <david(at)pgmasters(dot)net>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Magnus Hagander <magnus(at)hagander(dot)net>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: archive modules
Date: 2022-02-02 22:44:33
Message-ID: 20220202224433.GA1036711@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the review!

On Wed, Feb 02, 2022 at 01:42:55PM -0500, Robert Haas wrote:
> I think avoiding ERROR is going to be impractical. Catching it in the
> contrib module seems OK. Catching it in the generic code is probably
> also possible to do in a reasonable way. Not catching the error also
> seems like it would be OK, since we expect errors to be infrequent.
> I'm not objecting to anything you did here, but I'm uncertain why
> adding basic_archive along shell_archive changes the calculus here in
> any way. It just seems like a separate problem.

The main scenario I'm thinking about is when there is no space left for
archives. The shell archiving logic is pretty good about avoiding ERRORs,
so when there is a problem executing the command, the archiver will retry
the command a few times before giving up for a while. If basic_archive
just ERROR'd due to ENOSPC, it would cause the archiver to restart. Until
space frees up, I believe the archiver will end up restarting every 10
seconds.

I thought some more about adding a generic exception handler for the
archiving callback. I think we'd need to add a new callback function that
would perform any required cleanup (e.g., closing any files that might be
left open). That part isn't too bad. However, module authors will also
need to keep in mind that the archiving callback runs in its own transient
memory context. If the module needs to palloc() something that needs to
stick around for a while, it will need to do so in a different memory
context. With sufficient documentation, maybe this part isn't too bad
either, but in the end, all of this is to save an optional ~15 lines of
code in the module. It's not crucial to do your own ERROR handling in your
archive module, but if you want to, you can use basic_archive as a good
starting point.

tl;dr - I left it the same.

> + /* Perform logging of memory contexts of this process */
> + if (LogMemoryContextPending)
> + ProcessLogMemoryContextInterrupt();
>
> Any special reason for moving this up higher? Not really an issue, just curious.

Since archive_library changes cause the archiver to restart, I thought it
might be good to move this before the process might exit in case
LogMemoryContextPending and ConfigReloadPending are both true.

> + gettext_noop("This is unused if
> \"archive_library\" does not indicate archiving via shell is
> enabled.")
>
> This contains a double negative. We could describe it more positively:
> This is used only if \"archive_library\" specifies archiving via
> shell. But that's actually a little confusing, because the way you've
> set it up, archiving via shell can be specified by writing either
> archive_library = '' or archive_library = 'shell'. I don't see any
> particularly good reason to allow that to be spelled in two ways.
> Let's pick one. Then here we can write either:
>
> (a) This is used only if archive_library = 'shell'.
> -or-
> (b) This is used only if archive_library is not set.
>
> IMHO, either of those would be clearer than what you have right now,
> and it would definitely be shorter.

I went with (b). That felt a bit more natural to me, and it was easier to
code because I don't have to add error checking for an empty string.

> +/*
> + * Callback that gets called to determine if the archive module is
> + * configured.
> + */
> +typedef bool (*ArchiveCheckConfiguredCB) (void);
> +
> +/*
> + * Callback called to archive a single WAL file.
> + */
> +typedef bool (*ArchiveFileCB) (const char *file, const char *path);
> +
> +/*
> + * Called to shutdown an archive module.
> + */
> +typedef void (*ArchiveShutdownCB) (void);
>
> I think that this is the wrong amount of comments. One theory is that
> the reader should refer to the documentation to understand how these
> callbacks work. In that case, having a separate comment for each one
> that doesn't really say anything is just taking up space. It would be
> better to have one comment for all three lines referring the reader to
> the documentation. Alternatively, one could take the position that the
> explanation should go into these comments, and then perhaps we don't
> even really need documentation. A one-line comment that doesn't really
> say anything non-obvious seems like the worst amount.

In my quest to write well-commented code, I sometimes overdo it. I
adjusted these comments in the new revision.

> + <warning>
> + <para>
> + There are considerable robustness and security risks in using
> archive modules
> + because, being written in the <literal>C</literal> language, they
> have access
> + to many server resources. Administrators wishing to enable archive modules
> + should exercise extreme caution. Only carefully audited modules should be
> + loaded.
> + </para>
> + </warning>
>
> Maybe I'm just old and jaded, but do we really need this? I know we
> have the same thing for background workers, but if anything that seems
> like an argument against duplicating it elsewhere. Lots of copies of
> essentially identical warnings aren't the way to great documentation;
> if we copy this here, we'll probably copy it to more places. And also,
> it seems a bit like warning people that they shouldn't give their
> complete financial records to total strangers about whom they have no
> little or no information. Do tell.

I removed this in the new revision.

> +<programlisting>
> +typedef bool (*ArchiveCheckConfiguredCB) (void);
> +</programlisting>
> +
> + If <literal>true</literal> is returned, the server will proceed with
> + archiving the file by calling the <function>archive_file_cb</function>
> + callback. If <literal>false</literal> is returned, archiving will not
> + proceed. In the latter case, the server will periodically call this
> + function, and archiving will proceed if it eventually returns
> + <literal>true</literal>.
>
> It's not obvious from reading this why anyone would want to provide
> this callback, or have it do anything other than 'return true'. But
> there actually is a behavior difference if you provide this and have
> it return false, vs. just having archiving itself fail. At least, the
> message "archive_mode enabled, yet archiving is not configured" will
> be emitted. So that's something we could mention here.

The blurb just above this provides a bit more information, but I tried to
add some additional context in the new revision anyway.

> I would suggest s/if it eventually/only when it/

Done.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v18-0001-Introduce-archive-modules-infrastructure.patch text/x-diff 11.8 KB
v18-0002-Add-test-archive-module.patch text/x-diff 14.1 KB
v18-0003-Add-documentation-for-archive-modules.patch text/x-diff 30.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-02-02 22:52:56 Re: Server-side base backup: why superuser, not pg_write_server_files?
Previous Message Tom Lane 2022-02-02 22:43:31 Re: Server-side base backup: why superuser, not pg_write_server_files?