Re: refactoring basebackup.c

From: Sumanta Mukherjee <sumanta(dot)mukherjee(at)enterprisedb(dot)com>
To: Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: refactoring basebackup.c
Date: 2020-05-13 11:54:15
Message-ID: CAMSJAirkQpdYwJkijPHuwXKHG-Okp9AK1m6xKx1wBxVGWJ_JSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Suraj,

Two points I wanted to mention.

1. The max rate at which the transfer is happening when the tar size is
128 Kb is at most .48 GB/sec. Is there a possibility to understand what is
the buffer size which is being used. That could help us explain some part
of the puzzle.
2. Secondly the idea of taking just the min of two runs is a bit counter
to the following. How do we justify the performance numbers and attribute
that the differences is not related to noise. It might be better to do a
few experiments for each of the kind and then try and fit a basic linear
model and report the std deviation. "Order statistics" where you get the
min(X1, X2, ... , Xn) is generally a biased estimator. A variance
calculation of the biased statistics is a bit tricky and so the results
could be corrupted by noise.

With Regards,
Sumanta Mukherjee.
EnterpriseDB: http://www.enterprisedb.com

On Wed, May 13, 2020 at 9:31 AM Suraj Kharage <
suraj(dot)kharage(at)enterprisedb(dot)com> wrote:

> Hi,
>
> Did some performance testing by varying TAR_SEND_SIZE with Robert's
> refactor patch and without the patch to check the impact.
>
> Below are the details:
>
> *Backup type*: local backup using pg_basebackup
> *Data size*: Around 200GB (200 tables - each table around 1.05 GB)
> *different TAR_SEND_SIZE values*: 8kb, 32kb (default value), 128kB, 1MB (
> 1024kB)
>
> *Server details:*
> RAM: 500 GB CPU details: Architecture: x86_64 CPU op-mode(s): 32-bit,
> 64-bit Byte Order: Little Endian CPU(s): 128 Filesystem: ext4
>
> 8kb 32kb (default value) 128kB 1024kB
> Without refactor patch real 10m22.718s
> user 1m23.629s
> sys 8m51.410s real 8m36.245s
> user 1m8.471s
> sys 7m21.520s real 6m54.299s
> user 0m55.690s
> sys 5m46.502s real 18m3.511s
> user 1m38.197s
> sys 9m36.517s
> With refactor patch (Robert's patch) real 10m11.350s
> user 1m25.038s
> sys 8m39.226s real 8m56.226s
> user 1m9.774s
> sys 7m41.032s real 7m26.678s
> user 0m54.833s
> sys 6m20.057s real 18m17.230s
> user 1m42.749s
> sys 9m53.704s
>
> The above numbers are taken from the minimum of two runs of each scenario.
>
> I can see, when we have TAR_SEND_SIZE as 32kb or 128kb, it is giving us a
> good performance whereas, for 1Mb it is taking 2.5x more time.
>
> Please let me know your thoughts/suggestions on the same.
>
> --
> --
>
> Thanks & Regards,
> Suraj kharage,
> EnterpriseDB Corporation,
> The Postgres Database Company.
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2020-05-13 11:59:08 Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Previous Message Julien Rouhaud 2020-05-13 11:35:18 Re: Add "-Wimplicit-fallthrough" to default flags