Re: Speed up pg_checksums in cases where checksum already set

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Greg Sabino Mullane <htamfids(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up pg_checksums in cases where checksum already set
Date: 2021-05-27 02:17:03
Message-ID: YK8BH74lQNdb8Ro+@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 26, 2021 at 05:23:55PM -0400, Greg Sabino Mullane wrote:
> The attached patch makes an optimization to pg_checksums which prevents
> rewriting the block if the checksum is already what we expect. This can
> lead to much faster runs in cases where it is already set (e.g. enabled ->
> disabled -> enable, external helper process, interrupted runs, future
> parallel processes).

Makes sense.

> There is also an effort to not sync the data directory
> if no changes were written. Finally, added a bit more output on how many
> files were actually changed, e.g.:

- if (do_sync)
+ if (do_sync && total_files_modified)
{
pg_log_info("syncing data directory");
fsync_pgdata(DataDir, PG_VERSION_NUM);

Here, I am on the edge. It could be an advantage to force a flush of
the data folder anyway, no? Say, all the pages have a correct
checksum and they are in the OS cache, but they may not have been
flushed yet. That would emulate what initdb -S does already.

> Checksum operation completed
> Files scanned: 1236
> Blocks scanned: 23283
> Files modified: 38
> Blocks modified: 19194
> pg_checksums: syncing data directory
> pg_checksums: updating control file
> Checksums enabled in cluster

The addition of the number of files modified looks like an advantage.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tsunakawa.takay@fujitsu.com 2021-05-27 02:18:16 RE: Parallel Inserts in CREATE TABLE AS
Previous Message Andrew Dunstan 2021-05-27 02:15:09 Re: Add ZSON extension to /contrib/