Re: parallelizing the archiver

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallelizing the archiver
Date: 2021-09-30 04:47:34
Message-ID: E9035E94-EC76-436E-B6C9-1C03FBD8EF54@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9/10/21, 10:42 AM, "Robert Haas" <robertmhaas(at)gmail(dot)com> wrote:
> I had kind of been thinking that the way to attack this problem is to
> go straight to allowing for a background worker, because the other
> problem with archive_command is that running a shell command like cp,
> scp, or rsync is not really safe. It won't fsync your data, it might
> not fail if the file is in the archive already, and it definitely
> won't succeed without doing anything if there's a byte for byte
> identical file in the archive and fail if there's a file with
> different contents already in the archive. Fixing that stuff by
> running different shell commands is hard, but it wouldn't be that hard
> to do it in C code, and you could then also extend whatever code you
> wrote to do batching and parallelism; starting more workers isn't
> hard.
>
> However, I can't see the idea of running a shell command going away
> any time soon, in spite of its numerous and severe drawbacks. Such an
> interface provides a huge degree of flexibility and allows system
> admins to whack around behavior easily, which you don't get if you
> have to code every change in C. So I think command-based enhancements
> are fine to pursue also, even though I don't think it's the ideal
> place for most users to end up.

I've given this quite a bit of thought. I hacked together a batching
approach for benchmarking, and it seemed to be a decent improvement,
but you're still shelling out every N files, and all the stuff about
shell commands not being ideal that you mentioned still applies.
Perhaps it's still a good improvement, and maybe we should still do
it, but I get the idea that many believe we can still do better. So,
I looked into adding support for setting up archiving via an
extension.

The attached patch is a first try at adding alternatives for
archive_command, restore_command, archive_cleanup_command, and
recovery_end_command. It adds the GUCs archive_library,
restore_library, archive_cleanup_library, and recovery_end_library.
Each of these accepts a library name that is loaded at startup,
similar to shared_preload_libraries. _PG_init() is still used for
initialization, and you can use the same library for multiple purposes
by checking the new exported variables (e.g.,
process_archive_library_in_progress). The library is then responsible
for implementing the relevant function, such as _PG_archive() or
_PG_restore(). The attached patch also demonstrates a simple
implementation for an archive_library that is similar to the sample
archive_command in the documentation.

I tested the sample archive_command in the docs against the sample
archive_library implementation in the patch, and I saw about a 50%
speedup. (The archive_library actually syncs the files to disk, too.)
This is similar to the improvement from batching.

Of course, there are drawbacks to using an extension. Besides the
obvious added complexity of building an extension in C versus writing
a shell command, the patch disallows changing the libraries without
restarting the server. Also, the patch makes no effort to simplify
error handling, memory management, etc. This is left as an exercise
for the extension author.

I'm sure there are other ways to approach this, but I thought I'd give
it a try to see what was possible and to get the conversation started.

Nathan

Attachment Content-Type Size
v1-0001-backup-module-proof-of-concept.patch application/octet-stream 45.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-09-30 05:14:26 Re: Some thoughts about the TAP tests' wait_for_catchup()
Previous Message Amit Kapila 2021-09-30 04:14:43 Re: Failed transaction statistics to measure the logical replication progress