Re: WIP/PoC for parallel backup

From: Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP/PoC for parallel backup
Date: 2019-09-27 16:00:01
Message-ID: CADM=JejdsyYi6W4JOOX=D4WLfM0e9zLLQTGgajZ4mFCFvhf-Dw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert,

Thanks for the feedback. Please see the comments below:

On Tue, Sep 24, 2019 at 10:53 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
> wrote:
> > - BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
> > If the parallel option is there, then it will only do pg_start_backup,
> scans PGDATA and sends a list of file names.
>
> So IIUC, this would mean that BASE_BACKUP without PARALLEL returns
> tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a
> list of file names. I don't think that's a good approach. It's too
> confusing to have one replication command that returns totally
> different things depending on whether some option is given.
>

Sure. I will add a separate command (START_BACKUP) for parallel.

> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given
> list.
> > pg_basebackup will then send back a list of filenames in this command.
> This commands will be send by each worker and that worker will be getting
> the said files.
>
> Seems reasonable, but I think you should just pass one file name and
> use the command multiple times, once per file.
>

I considered this approach initially, however, I adopted the current
strategy to avoid multiple round trips between the server and clients and
save on query processing time by issuing a single command rather than
multiple ones. Further fetching multiple files at once will also aid in
supporting the tar format by utilising the existing ReceiveTarFile()
function and will be able to create a tarball for per tablespace per worker.

>
> > - STOP_BACKUP
> > when all workers finish then, pg_basebackup will send STOP_BACKUP
> command.
>
> This also seems reasonable, but surely the matching command should
> then be called START_BACKUP, not BASEBACKUP PARALLEL.
>
> > I have done a basic proof of concenpt (POC), which is also attached. I
> would appreciate some input on this. So far, I am simply dividing the list
> equally and assigning them to worker processes. I intend to fine tune this
> by taking into consideration file sizes. Further to add tar format support,
> I am considering that each worker process, processes all files belonging to
> a tablespace in its list (i.e. creates and copies tar file), before it
> processes the next tablespace. As a result, this will create tar files that
> are disjointed with respect tablespace data. For example:
>
> Instead of doing this, I suggest that you should just maintain a list
> of all the files that need to be fetched and have each worker pull a
> file from the head of the list and fetch it when it finishes receiving
> the previous file. That way, if some connections go faster or slower
> than others, the distribution of work ends up fairly even. If you
> instead pre-distribute the work, you're guessing what's going to
> happen in the future instead of just waiting to see what actually does
> happen. Guessing isn't intrinsically bad, but guessing when you could
> be sure of doing the right thing *is* bad.
>
> If you want to be really fancy, you could start by sorting the files
> in descending order of size, so that big files are fetched before
> small ones. Since the largest possible file is 1GB and any database
> where this feature is important is probably hundreds or thousands of
> GB, this may not be very important. I suggest not worrying about it
> for v1.
>

Ideally, I would like to support the tar format as well, which would be
much easier to implement when fetching multiple files at once since that
would enable using the existent functionality to be used without much
change.

Your idea of sorting the files in descending order of size seems very
appealing. I think we can do this and have the file divided among the
workers one by one i.e. the first file in the list goes to worker 1, the
second to process 2, and so on and so forth.

>
> > Say, tablespace t1 has 20 files and we have 5 worker processes and
> tablespace t2 has 10. Ignoring all other factors for the sake of this
> example, each worker process will get a group of 4 files of t1 and 2 files
> of t2. Each process will create 2 tar files, one for t1 containing 4 files
> and another for t2 containing 2 files.
>
> This is one of several possible approaches. If we're doing a
> plain-format backup in parallel, we can just write each file where it
> needs to go and call it good. But, with a tar-format backup, what
> should we do? I can see three options:
>
> 1. Error! Tar format parallel backups are not supported.
>
> 2. Write multiple tar files. The user might reasonably expect that
> they're going to end up with the same files at the end of the backup
> regardless of whether they do it in parallel. A user with this
> expectation will be disappointed.
>
> 3. Write one tar file. In this design, the workers have to take turns
> writing to the tar file, so you need some synchronization around that.
> Perhaps you'd have N threads that read and buffer a file, and N+1
> buffers. Then you have one additional thread that reads the complete
> files from the buffers and writes them to the tar file. There's
> obviously some possibility that the writer won't be able to keep up
> and writing the backup will therefore be slower than it would be with
> approach (2).
>
> There's probably also a possibility that approach (2) would thrash the
> disk head back and forth between multiple files that are all being
> written at the same time, and approach (3) will therefore win by not
> thrashing the disk head. But, since spinning media are becoming less
> and less popular and are likely to have multiple disk heads under the
> hood when they are used, this is probably not too likely.
>
> I think your choice to go with approach (2) is probably reasonable,
> but I'm not sure whether everyone will agree.
>

Yes for the tar format support, approach (2) is what I had in
mind. Currently I'm working on the implementation and will share the patch
in a couple of days.

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2019-09-27 16:20:02 Re: Append with naive multiplexing of FDWs
Previous Message Tom Lane 2019-09-27 15:59:03 Re: Instability of partition_prune regression test results