Re: WIP/PoC for parallel backup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
Cc: Ahsan Hadi <ahsan(dot)hadi(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>, David Zhang <david(dot)zhang(at)highgo(dot)ca>, Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>, Kashif Zeeshan <kashif(dot)zeeshan(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP/PoC for parallel backup
Date: 2020-05-21 13:41:54
Message-ID: CA+TgmoZjRFYiEKaVYSOYFrBnjyQe=PhrnhidpsrvLa6qLQ7Ugw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 21, 2020 at 2:06 AM Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com> wrote:
> Yes. My colleague Suraj tried this and here are the pg_stat_activity output files.
>
> Captured wait events after every 3 seconds during the backup for -
> 1: parallel backup for 100GB data with 4 workers (pg_stat_activity_normal_backup_100GB.txt)
> 2: Normal backup (without parallel backup patch) for 100GB data (pg_stat_activity_j4_100GB.txt)
>
> Here is the observation:
>
> The total number of events (pg_stat_activity) captured during above runs:
> - 314 events for normal backups
> - 316 events for parallel backups (-j 4)
>
> BaseBackupRead wait event numbers: (newly added)
> 37 - in normal backups
> 25 - in the parallel backup (-j 4)
>
> ClientWrite wait event numbers:
> 175 - in normal backup
> 1098 - in parallel backups
>
> ClientRead wait event numbers:
> 0 - ClientRead in normal backup
> 326 - ClientRead in parallel backups for diff processes. (all in idle state)

So, basically, when we go from 1 process to 4, the additional
processes spend all of their time waiting rather than doing any useful
work, and that's why there is no performance benefit. Presumably, the
reason they spend all their time waiting for ClientRead/ClientWrite is
because the network between the two machines is saturated, so adding
more processes that are trying to use it at maximum speed just leads
to spending more time waiting for it to be available.

Do we have the same results for the local backup case, where the patch helped?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-05-21 14:02:27 Re: Schedule of commit fests for PG14
Previous Message Tomas Vondra 2020-05-21 13:41:22 Re: Trouble with hashagg spill I/O pattern and costing