From: | Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com> |
---|---|
To: | John H <johnhyvr(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, wenhui qiu <qiuwenhuifx(at)gmail(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Justin Kwan <justinpkwan(at)outlook(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, vignesh <vignesh(at)cloudflare(dot)com>, vignesh ravichandran <admin(at)viggy28(dot)dev>, "hlinnaka(at)iki(dot)fi" <hlinnaka(at)iki(dot)fi>, "jkwan(at)cloudflare(dot)com" <jkwan(at)cloudflare(dot)com> |
Subject: | Re: Making pg_rewind faster |
Date: | 2025-10-15 14:26:56 |
Message-ID: | CAFC+b6qGJXRffhRxL3xOyd1ceBQTXb2iVGOLsEMpOF8rYxseZA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Wed, Oct 15, 2025 at 4:49 AM John H <johnhyvr(at)gmail(dot)com> wrote:
> Hi,
>
> On Fri, Oct 10, 2025 at 12:45 AM Srinath Reddy Sadipiralla
> <srinath2133(at)gmail(dot)com> wrote:
> >
> > ...
> > XLogFilePath , then validate this new path with the given path
> > ,this helps to catch invalid xlog files like
> pg_wal/00000001FFFFFFFFFFFFFF10.
> > ...
>
> Are you concerned that somehow these files, which are named like XLog
> files but actually
> aren't, are somehow created therefore we should sync them in case?
> I'm trying to understand how these files would be generated in the first
> place.
>
the problem is not when server generates them because
filename is properly calculated using the XLogRecPtr in
XLogWrite->XLogFileInit->XLogFileInitInternal->XLogFilePath
,the main problem is when if someone manually places an invalid WAL file
in pg_wal like 00000001FFFFFFFFFFFFFF10, IsXLogFileName will
consider it as valid ,so with the approach as i mentioned earlier we can
catch such cases.
>
> >
> > instead of passing last_common_segno down the call stack directly,
> > we can have a struct maybe called "rewindState" which will have the
> common
> > information related to both clusters involved in rewind ,so that in
> future
> > if we want to pass more such information down we won't be increasing
> > the number of arguments that need to be passed down to all functions in
> stack.
> >
>
> I don't feel too strongly about this. Not sure how we view
> future-proofing things,
> as in do we proactively wrap around structs or wait until there's an
> actual need.
>
> This is the first time it's been touched since it was introduced in 14 for
> what
> it's worth.
>
TBH first it seemed to me a good coding practice for future proofing,
then i checked if there's any place in postgres we are doing
something similar ,then found 1 in src/include/access/gist.h
typedef struct GISTDeletedPageContents
{
/* last xid which could see the page in a scan */
FullTransactionId deleteXid;
} GISTDeletedPageContents;
>
> >
> > we can improve the TAP test file header comment as
> > # Test the situation where we skip copying the same WAL files from
> source to target
> > # except if WAL file size differs.
> >
> > let me put it this way
> > 1) WALs are written to 000000010000000000000002 in primary.
> > ...
> > 7) so the last common WAL segment was 000000010000000000000003.
>
> Updated comments.
>
> >
> > ...
> > where entry->target_size != entry->source_size and we do copy,
> > like if stat->mtime is changed (which means file has been copied)
> > and target_stat->size != soucrce_stat->size here no error expected.
> >
>
> Updated patch with one additional segment has a byte appended to it
> on target but not source.
>
thanks for updating, i have attached a diff patch on
top of v9-0001 patch , where i tried to add more tests
to improve the validation that we copy the WAL file even
when it exists on both source and target but the size differs.
I also tried to rename some variables for readability,while
testing this out i found that usleep(1000); is not enough
to show that there has been a copy which changed the
stat->mtime of the file because i think the copy was very
fast (less than 1 millisecond) so the mtime of the
file after rewind's copy didn't change or filesystem precision
didn't caught the change, the copying of the file is confirmed
because the same file has different size before rewind and
same after rewind which these tests proved
ok($corrupt_wal_seg_stat_before_rewind->size !=
$corrupt_wal_seg_source_stat->size,"WAL segment $corrupt_wal_seg has
different size in source vs target before rewind");
ok 11 - WAL segment 000000010000000000000004 has different size in source
vs target before rewind
ok($corrupt_wal_seg_stat_after_rewind->size ==
$corrupt_wal_seg_source_stat->size, "WAL segment $corrupt_wal_seg was
copied: file sizes are same between target and source after rewind");
ok 12 - WAL segment 000000010000000000000004 was copied: file sizes are
same between target and source after rewind
and more over the pg_rewind confirmed this with a new debug
message which I added in this diff patch
diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c
index 69d7728ce44..743484f4833 100644
--- a/src/bin/pg_rewind/filemap.c
+++ b/src/bin/pg_rewind/filemap.c
@@ -732,11 +732,11 @@ decide_wal_file_action(const char *fname, XLogSegNo
last_common_segno,
*/
if (file_segno < last_common_segno && source_size == target_size)
{
pg_log_debug("WAL segment \"%s\" not copied to target", fname);
return FILE_ACTION_NONE;
}
+ pg_log_debug("WAL segment \"%s\" is copied to target", fname);
return FILE_ACTION_COPY;
}
ok 6 - run pg_rewind stderr /(?^:WAL segment \"000000010000000000000004\"
is copied to target)/
also did this instead of using catfile
-my $wal_skipped_path = File::Spec->catfile($node_primary->data_dir,
'pg_wal', $wal_seg_skipped);
+my $wal_skipped_path = $node_primary->data_dir . '/pg_wal/' .
$wal_seg_skipped;
seems consistent with other TAP tests.
--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/
Attachment | Content-Type | Size |
---|---|---|
v9-0001.diff | application/octet-stream | 7.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2025-10-15 14:35:58 | Re: get rid of RM_HEAP2_ID |
Previous Message | Tom Lane | 2025-10-15 14:16:01 | Re: Optimize LISTEN/NOTIFY |