Quick Links

Re: parallelizing the archiver

From:	Julien Rouhaud <rjuju123(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	"Bossart, Nathan" <bossartn(at)amazon(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: parallelizing the archiver
Date:	2021-09-10 15:48:54
Message-ID:	CAOBaU_YpHNp4aCEL5v-3UFVSdN65nCZ6=AR+o6q7H+A=C5huNg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Sep 10, 2021 at 11:22 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> Well, I guess I'm not convinced. Perhaps people with more knowledge of
> this than I may already know why it's beneficial, but in my experience
> commands like 'cp' and 'scp' are usually limited by the speed of I/O,
> not the fact that you only have one of them running at once. Running
> several at once, again in my experience, is typically not much faster.
> On the other hand, scp has a LOT of startup overhead, so it's easy to
> see the benefits of batching.

I totally agree that batching as many file as possible in a single
command is probably what's gonna achieve the best performance. But if
the archiver only gets an answer from the archive_command once it
tried to process all of the file, it also means that postgres won't be
able to remove any WAL file until all of them could be processed. It
means that users will likely have to limit the batch size and
therefore pay more startup overhead than they would like. In case of
archiving on server with high latency / connection overhead it may be
better to be able to run multiple commands in parallel. I may be
overthinking here and definitely having feedback from people with more
experience around that would be welcome.

> That is possibly true. I think it might work to just assume that you
> have to retry everything if it exits non-zero, but that requires the
> archive command to be smart enough to do something sensible if an
> identical file is already present in the archive.

Yes, it could be. I think that we need more feedback for that too.

> Sure. Actually, I think a background worker would be better than a
> separate daemon. Then it could just talk to shared memory directly.

I thought about it too, but I was under the impression that most
people would want to implement a custom daemon (or already have) with
some more parallel/thread friendly language.

In response to

Re: parallelizing the archiver at 2021-09-10 15:22:18 from Robert Haas

Responses

Re: parallelizing the archiver at 2021-09-10 17:07:01 from Jacob Champion
Re: parallelizing the archiver at 2021-09-10 17:09:51 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrey Borodin	2021-09-10 15:55:21	Re: parallelizing the archiver
Previous Message	Zhihong Yu	2021-09-10 15:48:01	Re: a misbehavior of partition row movement (?)