Re: design for parallel backup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: design for parallel backup
Date: 2020-04-20 18:09:06
Message-ID: CA+TgmobYon-NeverP58V9E8beqf7TgrZQE0DqDpb+CW5pf7F7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 20, 2020 at 8:50 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> It is not apparent how you are envisioning this division on the
> server-side. I think in the currently proposed patch, each worker on
> the client-side requests the specific files. So, how are workers going
> to request such numbered files and how we will ensure that the work
> division among workers is fair?

I think that the workers would just say "give me my share of the base
backup" and then the server would divide up the files as it went. It
would probably keep a queue of whatever files still need to be
processed in shared memory and each process would pop items from the
queue to send to its client.

> I think it also depends to some extent what we decide in the nearby
> thread [1] related to support of compression/encryption. Say, if we
> want to support a new compression on client-side then we need to
> anyway process the contents of each tar file in which case combining
> into single tar file might be okay but not sure what is the right
> thing here. I think this part needs some more thoughts.

Yes, it needs more thought, but the central idea is to try to create
something that is composable. For example, if we have to do LZ4
compression, and code to do GPG encryption, than we should be able to
do both without adding any more code. Ideally, we should also be able
to either of those operations either on the client side or on the
server side, using the same code either way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2020-04-20 18:09:20 Re: new heapcheck contrib module
Previous Message Robert Haas 2020-04-20 18:01:07 Re: fixing old_snapshot_threshold's time->xid mapping