Re: WIP/PoC for parallel backup

From: Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP/PoC for parallel backup
Date: 2020-04-02 15:16:57
Message-ID: CADM=JeiGCVEzLY4WmAX6PxtzZ8c5oo46F7rKOxXuOsMTSELCjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 2, 2020 at 4:47 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Fri, Mar 27, 2020 at 1:34 PM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
> wrote:
> > Yes, we are fetching a single file. However, SEND_FILES is still capable
> of fetching multiple files in one
> > go, that's why the name.
>
> I don't see why it should work that way. If we're fetching individual
> files, why have an unused capability to fetch multiple files?
>

Okay will rename and will modify the function to send a single file as well.

> > 1- parallel backup does not work with a standby server. In parallel
> backup, the server
> > spawns multiple processes and there is no shared state being maintained.
> So currently,
> > no way to tell multiple processes if the standby was promoted during the
> backup since
> > the START_BACKUP was called.
>
> Why would you need to do that? As long as the process where
> STOP_BACKUP can do the check, that seems good enough.
>

Yes, but the user will get the error only after the STOP_BACKUP, not while
the backup is
in progress. So if the backup is a large one, early error detection would
be much beneficial.
This is the current behavior of non-parallel backup as well.

>
> > 2- throttling. Robert previously suggested that we implement throttling
> on the client-side.
> > However, I found a previous discussion where it was advocated to be
> added to the
> > backend instead[1].
> >
> > So, it was better to have a consensus before moving the throttle
> function to the client.
> > That’s why for the time being I have disabled it and have asked for
> suggestions on it
> > to move forward.
> >
> > It seems to me that we have to maintain a shared state in order to
> support taking backup
> > from standby. Also, there is a new feature recently committed for backup
> progress
> > reporting in the backend (pg_stat_progress_basebackup). This
> functionality was recently
> > added via this commit ID: e65497df. For parallel backup to update these
> stats, a shared
> > state will be required.
>
> I've come around to the view that a shared state is a good idea and
> that throttling on the server-side makes more sense. I'm not clear on
> whether we need shared state only for throttling or whether we need it
> for more than that. Another possible reason might be for the
> progress-reporting stuff that just got added.
>

Okay, then I will add the shared state. And since we are adding the shared
state, we can use
that for throttling, progress-reporting and standby early error checking.

> > Since multiple pg_basebackup can be running at the same time,
> maintaining a shared state
> > can become a little complex, unless we disallow taking multiple parallel
> backups.
>
> I do not see why it would be necessary to disallow taking multiple
> parallel backups. You just need to have multiple copies of the shared
> state and a way to decide which one to use for any particular backup.
> I guess that is a little complex, but only a little.
>

There are two possible options:

(1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
(2) (Preferred Option) Use the WAL start location as the BackupID.

This BackupID should be given back as a response to start backup command.
All client workers

must append this ID to all parallel backup replication commands. So that we
can use this identifier

to search for that particular backup. Does that sound good?

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-04-02 15:24:36 Re: [HACKERS] WAL logging problem in 9.4.3?
Previous Message James Coleman 2020-04-02 15:04:20 Re: Proposal: Expose oldest xmin as SQL function for monitoring