Re: WIP/PoC for parallel backup

From: Hamid Akhtar <hamid(dot)akhtar(at)gmail(dot)com>
To: Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Ahsan Hadi <ahsan(dot)hadi(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, David Zhang <david(dot)zhang(at)highgo(dot)ca>, Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>, Kashif Zeeshan <kashif(dot)zeeshan(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP/PoC for parallel backup
Date: 2020-06-11 17:40:38
Message-ID: CANugjhuKbcRwZfqHepxm7uKNjEk6QRuSQaM1Om_6okJDjOwevA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

As far I understand, parallel backup is not a mandatory performance
feature, rather, one at user's discretion. This IMHO indicates that it will
benefit some users and it may not others.

Taking a backup is an I/O intensive workload. So by parallelizing it
through multiple worker threads/processes, creates an overhead of its own.
So what precisely are we optimizing here. Looking at a running database
system in any environment, I see the following potential scenarios playing
out. These are probably clear to everyone here, but I'm listing these for
completeness and clarity.

Locally Running Backup:
(1) Server has no clients connected other than base backup.
(2) Server has other clients connected which are actively performing
operations causing disk I/O.

Remotely Running Backup:
(3) Server has no clients connected other than remote base backup.
(4) Server has other clients connected which are actively performing
operations causing disk I/O.

Others:
(5) Server or the system running base backup has other processes competing
for disk or network bandwidth.

Generally speaking, I see that parallelization could potentially benefit in
scenarios (2), (4) and (5) with the reason being that having more than one
thread increases the likelihood that backup will now get a bigger time
slice for disk I/O and network bandwidth. With (1) and (3), since there are
no competing processes, addition of multiple threads or processes will only
increase CPU overhead whilst still getting the same network and disk time
slice. In this particular case, the performance will degrade.

IMHO, that’s why by adding other load on the server, perhaps by running
pgbench simultaneously may show improved performance for parallel backup.
Also, running parallel backup on a local laptop more often than yields
improved performance.

There are obviously other factors that may impact the performance like the
type of I/O scheduler being used whether CFQ or some other.

IMHO, parallel backup has obvious performance benefits, but we need to
ensure that users understand that there is potential for slower backup if
there is no competition for resources.

On Fri, May 22, 2020 at 11:03 AM Suraj Kharage <
suraj(dot)kharage(at)enterprisedb(dot)com> wrote:

>
> On Thu, May 21, 2020 at 7:12 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>>
>> So, basically, when we go from 1 process to 4, the additional
>> processes spend all of their time waiting rather than doing any useful
>> work, and that's why there is no performance benefit. Presumably, the
>> reason they spend all their time waiting for ClientRead/ClientWrite is
>> because the network between the two machines is saturated, so adding
>> more processes that are trying to use it at maximum speed just leads
>> to spending more time waiting for it to be available.
>>
>> Do we have the same results for the local backup case, where the patch
>> helped?
>>
>
> Here is the result for local backup case (100GB data). Attaching the
> captured logs.
>
> The total number of events (pg_stat_activity) captured during local runs:
> - 82 events for normal backups
> - 31 events for parallel backups (-j 4)
>
> BaseBackupRead wait event numbers: (newly added)
> 24 - in normal backups
> 14 - in parallel backup (-j 4)
>
> ClientWrite wait event numbers:
> 8 - in normal backup
> 43 - in parallel backups
>
> ClientRead wait event numbers:
> 0 - ClientRead in normal backup
> 32 - ClientRead in parallel backups for diff processes.
>
>
> --
> --
>
> Thanks & Regards,
> Suraj kharage,
> EnterpriseDB Corporation,
> The Postgres Database Company.
>

--
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950 EMAIL: mailto:hamid(dot)akhtar(at)highgo(dot)ca
SKYPE: engineeredvirus

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-06-11 17:45:25 Re: hashagg slowdown due to spill changes
Previous Message Andres Freund 2020-06-11 17:31:28 Re: global barrier & atomics in signal handlers (Re: Atomic operations within spinlocks)