parallelizing the archiver

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: parallelizing the archiver
Date: 2021-09-07 22:36:18
Message-ID: BC4D6BB2-6976-4397-A417-A6A30EEDC63E@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

I'd like to gauge interest in parallelizing the archiver process.
From a quick scan, I was only able to find one recent thread [0] that
brought up this topic, and ISTM the conventional wisdom is to use a
backup utility like pgBackRest that does things in parallel behind-
the-scenes. My experience is that the generating-more-WAL-than-we-
can-archive problem is pretty common, and parallelization seems to
help quite a bit, so perhaps it's a good time to consider directly
supporting parallel archiving in PostgreSQL.

Based on previous threads I've seen, I believe many in the community
would like to replace archive_command entirely, but what I'm proposing
here would build on the existing tools. I'm currently thinking of
something a bit like autovacuum_max_workers, but the archive workers
would be created once and would follow a competing consumers model.
Another approach I'm looking at is to use background worker processes,
although I'm not sure if linking such a critical piece of
functionality to max_worker_processes is a good idea. However, I do
see that logical replication uses background workers.

Anyway, I'm curious what folks think about this. I think it'd help
simplify server administration for many users.

Nathan

[0] https://www.postgresql.org/message-id/flat/20180828060221.x33gokifqi3csjj4%40depesz.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2021-09-07 22:39:30 Re: Column Filtering in Logical Replication
Previous Message Tom Lane 2021-09-07 22:00:48 Re: Bug in query rewriter - hasModifyingCTE not getting set