Re: Making pg_rewind faster

From: Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: John H <johnhyvr(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Justin Kwan <justinpkwan(at)outlook(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, vignesh ravichandran <admin(at)viggy28(dot)dev>, "hlinnaka(at)iki(dot)fi" <hlinnaka(at)iki(dot)fi>
Subject: Re: Making pg_rewind faster
Date: 2025-10-24 12:21:54
Message-ID: CAFC+b6p4xhJN=kF4kVC1_yKQGU89Kwad5pEOpQyqQBuQ3GRenA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 24, 2025 at 2:08 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Fri, Oct 24, 2025 at 04:56:33PM +0900, Michael Paquier wrote:
> > I was just wondering about some queries on the new standby once we are
> > done with the rewind. But after sleeping on it, I'd be happy with
> > just the set of tests we have: the debug output checks, the size
> > checks pre and post-rewind, and no mtime.
>
> So, I am coming down to the attached (minus the commit message) after
> more edits and adjustments. It's Friday evening here, and I am not
> planning to do more today and to take bets on the buildfarm, even if
> the CI is OK.
>

Thanks for the updated patch, I have reviewed and tested the
patch, except these minor refactors ,it LGTM.

diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c
index 59672e66932..467fd97ebcf 100644
--- a/src/bin/pg_rewind/filemap.c
+++ b/src/bin/pg_rewind/filemap.c
@@ -728,7 +728,7 @@ decide_wal_file_action(const char *fname, XLogSegNo
last_common_segno,
/*
* Avoid copying files before the last common segment.
*
- * These files exist on the source and the target services, so they
should
+ * These files exist on the source and the target servers, so they
should
* be identical and located strictly before the segment that
contains the
* LSN where target and source servers have diverged.
*
I think as we are not using mtime to show that the file has not been copied
and been skipped ,instead we are doing the same with the debug message
(qr/pg_wal\/$wal_seg_skipped \(NONE\)/,), so stat calculation of this WAL
segment can be removed.

diff --git a/src/bin/pg_rewind/t/011_wal_copy.pl b/src/bin/pg_rewind/t/
011_wal_copy.pl
index fb5b7104378..c4d7200d710 100644
--- a/src/bin/pg_rewind/t/011_wal_copy.pl
+++ b/src/bin/pg_rewind/t/011_wal_copy.pl
@@ -43,12 +43,6 @@ RewindTest::promote_standby;
my $new_timeline_wal_seg = $node_standby->safe_psql('postgres',
'SELECT pg_walfile_name(pg_current_wal_lsn())');

-# Get some stats info for the WAL file whose copy is skipped.
-my $wal_skipped_path =
- $node_primary->data_dir . '/pg_wal/' . $wal_seg_skipped;
-my $wal_skipped_stat = stat($wal_skipped_path);
-defined($wal_skipped_stat) or die("unable to stat $wal_skipped_path");
-
# Corrupt a WAL segment on target that has been generated before the
# divergence point. We will check that it is copied from the source.

*Test:*
I tried to demonstrate that with this patch, we are not copying WAL
segments before the divergence point by using wal_keep_size.

*Environment:*
A primary and standby physical streaming replication setup was used.
The TPC-H dbgen tool with a Scale Factor of 10 was used to generate
data, which in turn generated sufficient WAL to test with wal_keep_size =
20000 MB.
A failover and rewind were performed.

*Result:*

*Without Patch:*

SELECT
pg_size_pretty(sum(size)) AS wal_size
FROM
pg_ls_waldir();
wal_size
----------
19 GB
(1 row)

pg_rewind: connected to server
pg_rewind: servers diverged at WAL location 4/DEACBAB0 on timeline 1
pg_rewind: rewinding from last common checkpoint at 4/C8250D28 on timeline 1
pg_rewind: reading source file list
pg_rewind: reading target file list
pg_rewind: reading WAL in target
*pg_rewind: need to copy 20449 MB (total source directory size is 32778 MB)*
20940602/20940602 kB (100%) copied
pg_rewind: creating backup label and updating control file
pg_rewind: syncing target data directory
pg_rewind: Done!

To show that only a small amount of WAL was generated between the last
common
checkpoint and the latest checkpoint on the source:

postgres=# select pg_size_pretty('4/DEACBB78'::pg_lsn -
'4/C8250D28'::pg_lsn);
pg_size_pretty
----------------
360 MB
(1 row)

so the 20.449 GB copied was primarily due to wal_keep_size
(19 GB of accumulated WAL) being copied, as previously non-data files were
copied in full.

*With Patch:*

SELECT
pg_size_pretty(sum(size)) AS wal_size
FROM
pg_ls_waldir();
wal_size
----------
19 GB
(1 row)

pg_rewind: connected to server
pg_rewind: servers diverged at WAL location 4/B0F2FE78 on timeline 1
pg_rewind: rewinding from last common checkpoint at 4/B087E468 on timeline 1
pg_rewind: reading source file list
pg_rewind: reading target file list
pg_rewind: reading WAL in target
*pg_rewind: need to copy 212 MB (total source directory size is 32074 MB)*
*pg_rewind: skipped WAL 19168 MB* -> tweaked calculate_totals to get this,
just to give more info (not included in patch).
217467/217467 kB (100%) copied
pg_rewind: creating backup label and updating control file
pg_rewind: syncing target data directory
pg_rewind: Done!

postgres=# select pg_size_pretty('4/B0F30AB0'::pg_lsn -
'4/B087E468'::pg_lsn);
pg_size_pretty
----------------
6855 kB
(1 row)

This test shows that with the patch applied, only the necessary WAL segments
after the divergence point are copied, while previously accumulated WAL
segments (due to wal_keep_size) are correctly skipped.

--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2025-10-24 12:24:20 Re: Avoid handle leak (src/bin/pg_ctl/pg_ctl.c)
Previous Message Álvaro Herrera 2025-10-24 12:21:42 Re: Avoid resource leak (src/test/regress/pg_regress.c)