Re: backup manifests

From: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
To: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: backup manifests
Date: 2019-11-22 09:57:53
Message-ID: CAGPqQf1C-X+=dOr6UpnSywDdZyn4oP5K0Hu=ghzx9KkwxRiOFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you Jeevan for reviewing the patch.

On Thu, Nov 21, 2019 at 2:33 PM Jeevan Chalke <
jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:

>
>
> On Tue, Nov 19, 2019 at 3:30 PM Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
> wrote:
>
>>
>>
>> My colleague Suraj did testing and noticed the performance impact
>> with the checksums. On further testing, he found that specifically with
>> sha its more of performance impact.
>>
>> Please find below statistics:
>>
>> no of tables without checksum SHA256
>> checksum % performnce
>> overhead
>> with
>> SHA-256 md5 checksum % performnce
>> overhead with md5 CRC checksum % performnce
>> overhead with
>> CRC
>> 10 (100 MB
>> in each table) real 0m10.957s
>> user 0m0.367s
>> sys 0m2.275s real 0m16.816s
>> user 0m0.210s
>> sys 0m2.067s 53% real 0m11.895s
>> user 0m0.174s
>> sys 0m1.725s 8% real 0m11.136s
>> user 0m0.365s
>> sys 0m2.298s 2%
>> 20 (100 MB
>> in each table) real 0m20.610s
>> user 0m0.484s
>> sys 0m3.198s real 0m31.745s
>> user 0m0.569s
>> sys 0m4.089s
>> 54% real 0m22.717s
>> user 0m0.638s
>> sys 0m4.026s 10% real 0m21.075s
>> user 0m0.538s
>> sys 0m3.417s 2%
>> 50 (100 MB
>> in each table) real 0m49.143s
>> user 0m1.646s
>> sys 0m8.499s real 1m13.683s
>> user 0m1.305s
>> sys 0m10.541s 50% real 0m51.856s
>> user 0m0.932s
>> sys 0m7.702s 6% real 0m49.689s
>> user 0m1.028s
>> sys 0m6.921s 1%
>> 100 (100 MB
>> in each table) real 1m34.308s
>> user 0m2.265s
>> sys 0m14.717s real 2m22.403s
>> user 0m2.613s
>> sys 0m20.776s 51% real 1m41.524s
>> user 0m2.158s
>> sys 0m15.949s
>> 8% real 1m35.045s
>> user 0m2.061s
>> sys 0m16.308s 1%
>> 100 (1 GB
>> in each table) real 17m18.336s
>> user 0m20.222s
>> sys 3m12.960s real 24m45.942s
>> user 0m26.911s
>> sys 3m33.501s 43% real 17m41.670s
>> user 0m26.506s
>> sys 3m18.402s 2% real 17m22.296s
>> user 0m26.811s
>> sys 3m56.653s
>>
>> sometimes, this test
>> completes within the
>> same time as without
>> checksum. approx. 0.5%
>>
>>
>> Considering the above results, I modified the earlier Robert's patch and
>> added
>> "manifest_with_checksums" option to pg_basebackup. With a new patch.
>> by default, checksums will be disabled and will be only enabled when
>> "manifest_with_checksums" option is provided. Also re-based all patch
>> set.
>>
>
> Review comments on 0004:
>
> 1.
> I don't think we need o_manifest_with_checksums variable,
> manifest_with_checksums can be used instead.
>

Yes, done in the latest version of opatch.

> 2.
> We need to document this new option for pg_basebackup and basebackup.
>
>
Done, attaching documentation patch with the mail.

3.
> Also, instead of keeping manifest_with_checksums as a global variable, we
> should pass that to the required function. Patch 0002 already modified the
> signature of all relevant functions anyways. So just need to add one more
> bool
> variable there.
>
>
yes, earlier I did that implementation but later found that we already
have checksum related global variable i.e. noverify_checksums, so
that it will be clean implementation - rather modifying the function
definition
to pass the variable (which is actually global for the operation).

4.
> Why we need a "File" at the start of each entry as we are adding files
> only?
> I wonder if we also need to provide a tablespace name and directory marker
> so
> that we have "Tablespace" and "Dir" at the start.
>

Sorry, I am not quite sure about this, may be Robert is right person
to answer this.

> 5.
> If I don't provide manifest-with-checksums option then too I see that
> checksum
> is calculated for backup_manifest file itself. Is that intentional or
> missed?
> I think we should omit that too if this option is not provided.
>
>
Oops yeah, corrected this in the latest version of patch.

6.
> Is it possible to get only a backup manifest from the server? A client like
> pg_basebackup can then use that to fetch files reading that.
>
>
Currently we don't have any option to just get the manifest file from the
server. I am not sure but why we need this at this point of time.

Regards,

Rushabh Lathia
www.EnterpriseDB.com

Attachment Content-Type Size
0001-Reduce-code-duplication-and-eliminate-weird-macro-tr.patch text/x-patch 34.8 KB
0003-Fix-warnings.patch text/x-patch 1.8 KB
0004-Make-checksum-optional-in-pg_basebackup-review-comme.patch text/x-patch 9.2 KB
0005-Documentation-for-backup-manifest-and-manifest-with-.patch text/x-patch 3.6 KB
0002-POC-of-backup-manifest-with-file-names-sizes-timesta.patch text/x-patch 21.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2019-11-22 09:57:58 Re: dropdb --force
Previous Message vignesh C 2019-11-22 09:46:19 Re: dropdb --force