Re: parallelizing the archiver

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "Bossart, Nathan" <bossartn(at)amazon(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallelizing the archiver
Date: 2021-09-10 14:19:29
Message-ID: CAOBaU_ZFXHgZo=X6-vUscgKuwvXC1pTKeerDsZyEbbFbyjt0bg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 10, 2021 at 9:13 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> To me, it seems way more beneficial to think about being able to
> invoke archive_command with many files at a time instead of just one.
> I think for most plausible archive commands that would be way more
> efficient than what you propose here. It's *possible* that if we had
> that, we'd still want this, but I'm not even convinced.

Those approaches don't really seems mutually exclusive? In both case
you will need to internally track the status of each WAL file and
handle non contiguous file sequences. In case of parallel commands
you only need additional knowledge that some commands is already
working on a file. Wouldn't it be even better to eventually be able
launch multiple batches of multiple files rather than a single batch?

If we start with parallelism first, the whole ecosystem could
immediately benefit from it as is. To be able to handle multiple
files in a single command, we would need some way to let the server
know which files were successfully archived and which files weren't,
so it requires a different communication approach than the command
return code.

But as I said, I'm not convinced that using the archive_command
approach for that is the best approach If I understand correctly,
most of the backup solutions would prefer to have a daemon being
launched and use it at a queuing system. Wouldn't it be better to
have a new archive_mode, e.g. "daemon", and have postgres responsible
to (re)start it, and pass information through the daemon's
stdin/stdout or something like that?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-09-10 14:29:48 Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
Previous Message torikoshia 2021-09-10 14:10:43 Re: EXPLAIN(VERBOSE) to CTE with SEARCH BREADTH FIRST fails