Re: WIP/PoC for parallel backup

From: Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
To: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP/PoC for parallel backup
Date: 2020-02-25 14:18:42
Message-ID: CADM=Jej4UxVHkR-gxV0eY0TkFeZAMKkYqQhY+kwbO12iSam+0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I have created a commitfest entry.
https://commitfest.postgresql.org/27/2472/

On Mon, Feb 17, 2020 at 1:39 PM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com> wrote:

> Thanks Jeevan. Here is the documentation patch.
>
> On Mon, Feb 10, 2020 at 6:49 PM Jeevan Chalke <
> jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:
>
>> Hi Asif,
>>
>> On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>
>> wrote:
>>
>>>
>>> Here are the the updated patches, taking care of the issues pointed
>>> earlier. This patch adds the following commands (with specified option):
>>>
>>> START_BACKUP [LABEL '<label>'] [FAST]
>>> STOP_BACKUP [NOWAIT]
>>> LIST_TABLESPACES [PROGRESS]
>>> LIST_FILES [TABLESPACE]
>>> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
>>> SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
>>> [NOVERIFY_CHECKSUMS]
>>>
>>>
>>> Parallel backup is not making any use of tablespace map, so I have
>>> removed that option from the above commands. There is a patch pending
>>> to remove the exclusive backup; we can further refactor the
>>> do_pg_start_backup
>>> function at that time, to remove the tablespace information and move the
>>> creation of tablespace_map file to the client.
>>>
>>>
>>> I have disabled the maxrate option for parallel backup. I intend to send
>>> out a separate patch for it. Robert previously suggested to implement
>>> throttling on the client-side. I found the original email thread [1]
>>> where throttling was proposed and added to the server. In that thread,
>>> it was originally implemented on the client-side, but per many
>>> suggestions,
>>> it was moved to server-side.
>>>
>>> So, I have a few suggestions on how we can implement this:
>>>
>>> 1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
>>> the user could choose the bandwidth allocation for each worker. This
>>> approach
>>> can be implemented on the client-side as well as on the server-side.
>>>
>>> 2- have the maxrate, be divided among workers equally at first. and the
>>> let the main thread keep adjusting it whenever one of the workers
>>> finishes.
>>> I believe this would only be possible if we handle throttling on the
>>> client.
>>> Also, as I understand it, implementing this will introduce additional
>>> mutex
>>> for handling of bandwidth consumption data so that rate may be adjusted
>>> according to data received by threads.
>>>
>>> [1]
>>> https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af
>>>
>>> --
>>> Asif Rehman
>>> Highgo Software (Canada/China/Pakistan)
>>> URL : www.highgo.ca
>>>
>>>
>>
>> The latest changes look good to me. However, the patch set is missing the
>> documentation.
>> Please add those.
>>
>> Thanks
>>
>> --
>> Jeevan Chalke
>> Associate Database Architect & Team Lead, Product Development
>> EnterpriseDB Corporation
>> The Enterprise PostgreSQL Company
>>
>>
>
> --
> --
> Asif Rehman
> Highgo Software (Canada/China/Pakistan)
> URL : www.highgo.ca
>
>

--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Asif Rehman 2020-02-25 14:34:35 Re: [Patch] Base backups and random or zero pageheaders
Previous Message Alvaro Herrera 2020-02-25 14:00:55 Re: pg_trigger.tgparentid