Re: parallelizing the archiver

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallelizing the archiver
Date: 2021-10-01 21:21:04
Message-ID: B2B73565-636B-4883-9D25-840F8300B629@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/1/21, 12:08 PM, "Andrey Borodin" <x4mmm(at)yandex-team(dot)ru> wrote:
> 30 сент. 2021 г., в 09:47, Bossart, Nathan <bossartn(at)amazon(dot)com> написал(а):
>> I tested the sample archive_command in the docs against the sample
>> archive_library implementation in the patch, and I saw about a 50%
>> speedup. (The archive_library actually syncs the files to disk, too.)
>> This is similar to the improvement from batching.
> Why test sample agains sample? I think if one tests this agains real archive tool doing archive_status lookup and ready->done renaming results will be much different.

My intent was to demonstrate the impact of reducing the amount of
overhead when archiving. I don't doubt that third party archive tools
can show improvements by doing batching/parallelism behind the scenes.

>> Of course, there are drawbacks to using an extension. Besides the
>> obvious added complexity of building an extension in C versus writing
>> a shell command, the patch disallows changing the libraries without
>> restarting the server. Also, the patch makes no effort to simplify
>> error handling, memory management, etc. This is left as an exercise
>> for the extension author.
> I think the real problem with extension is quite different than mentioned above.
> There are many archive tools that already feature parallel archiving. PgBackrest, wal-e, wal-g, pg_probackup, pghoard, pgbarman and others. These tools by far outweight tools that don't look into archive_status to parallelize archival.
> And we are going to ask them: add also a C extension without any feasible benefit to the user. You only get some restrictions like system restart to enable shared library.
>
> I think we need a design that legalises already existing de-facto standard features in archive tools. Or event better - enables these tools to be more efficient, reliable etc. Either way we will create legacy code from the scratch.

My proposal wouldn't require any changes to any of these utilities.
This design just adds a new mechanism that would allow end users to
set up archiving a different way with less overhead in hopes that it
will help them keep up. I suspect a lot of work has been put into the
archive tools you mentioned to make sure they can keep up with high
rates of WAL generation, so I'm skeptical that anything we do here
will really benefit them all that much. Ideally, we'd do something
that improves matters for everyone, though. I'm open to suggestions.

Nathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-10-01 21:33:01 Re: [PATCH] Error out if SKIP LOCKED and WITH TIES are both specified
Previous Message Daniel Gustafsson 2021-10-01 21:20:09 Re: libpq compression