Re: Making pg_rewind faster

From: Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: John H <johnhyvr(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Justin Kwan <justinpkwan(at)outlook(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, vignesh ravichandran <admin(at)viggy28(dot)dev>, "hlinnaka(at)iki(dot)fi" <hlinnaka(at)iki(dot)fi>
Subject: Re: Making pg_rewind faster
Date: 2025-10-23 16:08:23
Message-ID: CAFC+b6ryuc_5uwkNcHDXUYZWZ+zUydTPPTS5xiZY1Z_t-EpkjQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Michael,

On Thu, Oct 23, 2025 at 1:52 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Tue, Oct 21, 2025 at 04:14:55PM -0700, John H wrote:
> > Made the changes and included the documentation update.
>
> I have been looking at this patch, and the first part that stood out
> is that the refactoring related to isRelDataFile() can be done as an
> independent piece, cutting the total size of the main patch by 30% or
> so. So I have extracted this part, simplified it to make it more
> consistent and les noisy on HEAD, and applied it as 6ae08d9583e9.
>

yeah ... that's cleaner, thanks for committing.

>
> +# Sleep to allow mtime to be different
> +usleep(1000000);
> [...]
> + [qr/WAL segment \"$wal_seg_skipped\" not copied to target/,
> + qr/pg_wal\/$corrupt_wal_seg \(COPY\)/
> + ],
>
> FWIW, I am definitely not a fan of the test relying on the timestamp
> of the files based on their retrieved meta-data stats, with the mtime.
> I suspect complications on Windows. Worse thing, there is a forced
> sleep of 1s added to the test, to make sure that the timestamp of the
> files we compare have a different number. But do we really need that
> anyway if we have the debug information with the file map printed in
> it?
>

thought it would act like an extra sanity check ,to prove the point that the
pg_rewind is saying through the debug info that it has been really copied
or skipped.

>
> Hence, I think that we should simplify and rework the test a bit,
> tweaking a bit how the filemap is printed: now that we can detect if
> an operation is happening on a WAL file thanks to file_content_type_t,
> let's always print the type of operation done for each WAL file and
> remove this "not copied to target" debug log which provides the same
> information, then let's compare the debug output with what we want.
>

i guess this is what you want ,please let me know if it's otherwise,

diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c
index 2d82edec060..2180495d337 100644
--- a/src/bin/pg_rewind/filemap.c
+++ b/src/bin/pg_rewind/filemap.c
@@ -546,7 +546,7 @@ print_filemap(filemap_t *filemap)
for (i = 0; i < filemap->nentries; i++)
{
entry = filemap->entries[i];
- if (entry->action != FILE_ACTION_NONE ||
+ if (entry->content_type == FILE_CONTENT_TYPE_WAL ||
entry->action != FILE_ACTION_NONE ||
entry->target_pages_to_overwrite.bitmapsize > 0)
{
pg_log_debug("%s (%s)", entry->path,
@@ -733,7 +733,6 @@ decide_wal_file_action(const char *fname, XLogSegNo
last_common_segno,
*/
if (file_segno < last_common_segno && source_size == target_size)
{
- pg_log_debug("WAL segment \"%s\" not copied to target",
fname);
return FILE_ACTION_NONE;
}

diff --git a/src/bin/pg_rewind/t/011_wal_copy.pl b/src/bin/pg_rewind/t/
011_wal_copy.pl
index b9e24844654..3e12744dfa6 100644
--- a/src/bin/pg_rewind/t/011_wal_copy.pl
+++ b/src/bin/pg_rewind/t/011_wal_copy.pl
@@ -93,7 +93,7 @@ command_checks_all(
0,
[qr//],
[
- qr/WAL segment \"$wal_seg_skipped\" not copied to target/,
+ qr/pg_wal\/$wal_seg_skipped \(NONE\)/,
qr/pg_wal\/$corrupt_wal_seg \(COPY\)/
],
'run pg_rewind');

> It seems to me that it would be good for the test to run some sanity
> checks on the rewound standby, as well. That would provide more
> validation for the case of the "corrupted" segment that gets copied
> anyway.
>

i think these checks help do the same

1) qr/pg_wal\/$corrupt_wal_seg \(COPY\)/
this shows the corrupted segment is copied.

2) ok( $corrupt_wal_seg_stat_before_rewind->size !=
$corrupt_wal_seg_source_stat->size,
"Expected WAL segment $corrupt_wal_seg to have different size in
source vs target before rewind"
);
my $corrupt_wal_seg_stat_after_rewind =
stat($corrupt_wal_seg_in_target_path);
ok( $corrupt_wal_seg_stat_after_rewind->size ==
$corrupt_wal_seg_source_stat->size,
"Expected WAL segment $corrupt_wal_seg file sizes to be same between
target and source after rewind as it was copied"
);
These checks show that before the corrupt segment's size on
rewound standby(target) was not the same as source but after
rewind it was the same as source,please let me know if i am
missing your point here.

--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2025-10-23 16:21:33 Bug in pg_stat_statements
Previous Message David G. Johnston 2025-10-23 15:58:23 Re: CI: Add task that runs pgindent