Re: WIP/PoC for parallel backup

From: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
To: asifr(dot)rehman(at)gmail(dot)com
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP/PoC for parallel backup
Date: 2019-10-07 08:52:36
Message-ID: CAGPqQf1LkAz4b_EZn6KZW=20WAxNipgV9-BEoMQsLkZe1FqKjg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks Asif for the patch. I am opting this for a review. Patch is
bit big, so here are very initial comments to make the review process
easier.

1) Patch seems doing lot of code shuffling, I think it would be easy
to review if you can break the clean up patch separately.

Example:
a: setup_throttle
b: include_wal_files

2) As I can see this patch basically have three major phase.

a) Introducing new commands like START_BACKUP, SEND_FILES_CONTENT and
STOP_BACKUP.
b) Implementation of actual parallel backup.
c) Testcase

I would suggest, if you can break out in three as a separate patch that
would be nice. It will benefit in reviewing the patch.

3) In your patch you are preparing the backup manifest (file which
giving the information about the data files). Robert Haas, submitted
the backup manifests patch on another thread [1], and I think we
should use that patch to get the backup manifests for parallel backup.

Further, I will continue to review patch but meanwhile if you can
break the patches - so that review process be easier.

[1]
https://www.postgresql.org/message-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com

Thanks,

On Fri, Oct 4, 2019 at 4:32 PM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com> wrote:

>
>
> On Thu, Oct 3, 2019 at 6:40 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>> On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
>> wrote:
>> >> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in
>> given list.
>> >> > pg_basebackup will then send back a list of filenames in this
>> command. This commands will be send by each worker and that worker will be
>> getting the said files.
>> >>
>> >> Seems reasonable, but I think you should just pass one file name and
>> >> use the command multiple times, once per file.
>> >
>> > I considered this approach initially, however, I adopted the current
>> strategy to avoid multiple round trips between the server and clients and
>> save on query processing time by issuing a single command rather than
>> multiple ones. Further fetching multiple files at once will also aid in
>> supporting the tar format by utilising the existing ReceiveTarFile()
>> function and will be able to create a tarball for per tablespace per worker.
>>
>> I think that sending multiple filenames on a line could save some time
>> when there are lots of very small files, because then the round-trip
>> overhead could be significant.
>>
>> However, if you've got mostly big files, I think this is going to be a
>> loser. It'll be fine if you're able to divide the work exactly evenly,
>> but that's pretty hard to do, because some workers may succeed in
>> copying the data faster than others for a variety of reasons: some
>> data is in memory, some data has to be read from disk, different data
>> may need to be read from different disks that run at different speeds,
>> not all the network connections may run at the same speed. Remember
>> that the backup's not done until the last worker finishes, and so
>> there may well be a significant advantage in terms of overall speed in
>> putting some energy into making sure that they finish as close to each
>> other in time as possible.
>>
>> To put that another way, the first time all the workers except one get
>> done while the last one still has 10GB of data to copy, somebody's
>> going to be unhappy.
>>
>
> I have updated the patch (see the attached patch) to include tablespace
> support, tar format support and all other backup base backup options to
> work in parallel mode as well. As previously suggested, I have removed
> BASE_BACKUP [PARALLEL] and have added START_BACKUP instead to start the
> backup. The tar format will write multiple tar files depending upon the
> number of workers specified. Also made all commands
> (START_BACKUP/SEND_FILES_CONTENT/STOP_BACKUP) to accept the
> base_backup_opt_list. This way the command-line options can also be
> provided to these commands. Since the command-line options don't change
> once the backup initiates, I went this way instead of storing them in
> shared state.
>
> The START_BACKUP command will now return a sorted list of files in
> descending order based on file sizes. This way, the larger files will be on
> top of the list. hence these files will be assigned to workers one by one,
> making it so that the larger files will be copied before other files.
>
> Based on my understanding your main concern is that the files won't be
> distributed fairly i.e one worker might get a big file and take more time
> while others get done early with smaller files? In this approach I have
> created a list of files in descending order based on there sizes so all the
> big size files will come at the top. The maximum file size in PG is 1GB so
> if we have four workers who are picking up file from the list one by one,
> the worst case scenario is that one worker gets a file of 1GB to process
> while others get files of smaller size. However with this approach of
> descending files based on size and handing it out to workers one by one,
> there is a very high likelihood of workers getting work evenly. does this
> address your concerns?
>
> Furthermore the patch also includes the regression test. As t/
> 010_pg_basebackup.pl test-case is testing base backup comprehensively, so
> I have duplicated it to "t/040_pg_basebackup_parallel.pl" and added
> parallel option in all of its tests, to make sure parallel mode works
> expectantly. The one thing that differs from base backup is the file
> checksum reporting. In parallel mode, the total number of checksum failures
> are not reported correctly however it will abort the backup whenever a
> checksum failure occurs. This is because processes are not maintaining any
> shared state. I assume that it's not much important to report total number
> of failures vs noticing the failure and aborting.
>
>
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

--
Rushabh Lathia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikolay Shaplov 2019-10-07 09:42:39 Re: [PATCH] use separate PartitionedRelOptions structure to store partitioned table options
Previous Message vignesh C 2019-10-07 08:44:05 Re: Updated some links which are not working with new links