| From: | Mats Kindahl <mats(dot)kindahl(at)gmail(dot)com> |
|---|---|
| To: | Japin Li <japinli(at)hotmail(dot)com> |
| Cc: | surya poondla <suryapoondla4(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: pg_rewind does not rewind diverging timelines |
| Date: | 2026-05-30 20:26:11 |
| Message-ID: | 9ce0d2b9-7a41-4a8a-b299-da295bb4514f@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Japin,
On 5/29/26 04:01, Japin Li wrote:
> Hi, Mats
>
> On Tue, 26 May 2026 at 18:03, Mats Kindahl <mats(dot)kindahl(at)gmail(dot)com> wrote:
>> Attached a new version of the patch with the changes you suggested.
>>
> I found an error on the Windows platform [1].
>
> [07:08:28.538] >>> MALLOC_PERTURB_=168 PG_REGRESS=C:\cirrus\build\src/test\regress\pg_regress.exe REGRESS_SHLIB=C:\cirrus\build\src/test\regress\regress.dll MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 top_builddir=C:\cirrus\build UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MESON_TEST_ITERATION=1 PATH=C:\cirrus\build\tmp_install\usr\local\pgsql\bin;C:\cirrus\build\src\bin\pg_rewind;C:/cirrus/build/src/bin/pg_rewind/test;C:\VS_2019\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64;C:\VS_2019\MSBuild\Current\bin\Roslyn;C:\Program Files (x86)\Windows Kits\10\bin\10.0.22621.0\x64;C:\Program Files (x86)\Windows Kits\10\bin\x64;C:\VS_2019\\MSBuild\Current\Bin;C:\Windows\Microsoft.NET\Framework64\v4.0.30319;C:\VS_2019\Common7\IDE\;C:\VS_2019\Common7\Tools\;C:\VS_2019\VC\Auxiliary\Build;C:\zstd\zstd-v1.5.2-win64;C:\zlib;C:\lz4;C:\icu;C:\winflexbison;C:\strawberry\5.42.0.1\perl\bin;C:\python\Scripts\;C:\python\;C:\Windows Kits\10\Debuggers\x64;C:\Program Files\Git\usr\bin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\ProgramData\GooGet;C:\Program Files\Google\Compute Engine\metadata_scripts;C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\bin;C:\Program Files\PowerShell\7\;C:\Program Files\Google\Compute Engine\sysprep;C:\ProgramData\chocolatey\bin;C:\Program Files\Git\cmd;C:\Program Files\Git\mingw64\bin;C:\Program Files\Git\usr\bin;C:\Windows\system32\config\systemprofile\AppData\Local\Microsoft\WindowsApps INITDB_TEMPLATE=C:/cirrus/build/tmp_install/initdb-template ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 share_contrib_dir=C:/cirrus/build/tmp_install//usr/local/pgsql/share/contrib C:\python\python3.EXE C:\cirrus\build\..\src/tools/testwrap --basedir C:\cirrus\build --srcdir C:\cirrus\src\bin\pg_rewind --pg-test-extra --testgroup pg_rewind --testname 005_same_timeline -- C:\strawberry\5.42.0.1\perl\bin\perl.EXE -I C:/cirrus/src/test/perl -I C:\cirrus\src\bin\pg_rewind C:/cirrus/src/bin/pg_rewind/t/005_same_timeline.pl
> [07:08:28.538] ------------------------------------- 8< -------------------------------------
> [07:08:28.538] stderr:
> [07:08:28.538] # Failed test 'pg_rewind rewinds across mismatched TLI 2 / TLI 2-prime to TLI 1'
> [07:08:28.538] # at C:/cirrus/src/bin/pg_rewind/t/005_same_timeline.pl line 45.
> [07:08:28.538] # ---------- command failed ----------
> [07:08:28.538] # pg_rewind --debug --source-pgdata C:\cirrus\build/testrun/pg_rewind/005_same_timeline\data/t_005_same_timeline_node_b2_data/pgdata --target-pgdata C:\cirrus\build/testrun/pg_rewind/005_same_timeline\data/t_005_same_timeline_node_a2_data/pgdata --no-sync --config-file C:\cirrus\build\testrun\pg_rewind\005_same_timeline\data\tmp_test_ZCeZ/target-postgresql.conf.tmp --restore-target-wal
> [07:08:28.538] # -------------- stderr --------------
> [07:08:28.538] # pg_rewind: using for rewind "restore_command = 'cp "C:cirrusuild/testrun/pg_rewind/005_same_timelinedata/t_005_same_timeline_node_x_data/pgdata/pg_wal/%f" "%p"'"
> [07:08:28.538] # pg_rewind: Source timeline history:
> [07:08:28.538] # pg_rewind: 1: 0/00000000 - 0/040000E0
> [07:08:28.538] # pg_rewind: 2: 0/040000E0 - 0/00000000
> [07:08:28.538] # pg_rewind: Target timeline history:
> [07:08:28.538] # pg_rewind: 1: 0/00000000 - 0/040000E0
> [07:08:28.538] # pg_rewind: 2: 0/040000E0 - 0/060000E0
> [07:08:28.538] # pg_rewind: 3: 0/060000E0 - 0/00000000
> [07:08:28.538] # pg_rewind: servers diverged at WAL location 0/040000E0 on timeline 1
> [07:08:28.538] # cp: cannot stat 'C:cirrus'$'\b''uild/testrun/pg_rewind/005_same_timelinedata/t_005_same_timeline_node_x_data/pgdata/pg_wal/000000020000000000000004': No such file or directory
> [07:08:28.538] # pg_rewind: error: could not restore file "000000020000000000000004" from archive
> [07:08:28.538] # pg_rewind: error: could not find previous WAL record at 0/040000E0
> [07:08:28.538] # ------------------------------------
> [07:08:28.538] # Failed test 'rewound node reflects source history, not target TLI 2/TLI 3 data'
> [07:08:28.538] # at C:/cirrus/src/bin/pg_rewind/t/005_same_timeline.pl line 260.
> [07:08:28.538] # got: 'origin2
> [07:08:28.538] # x'
> [07:08:28.538] # expected: 'b
> [07:08:28.538] # origin2'
> [07:08:28.538] # Looks like you failed 2 tests of 11.
> [07:08:28.538]
> [07:08:28.538] (test program exited with status code 2)
> [07:08:28.538] ------------------------------------------------------------------------------
> [07:08:28.538]
>
>
> [1] https://cirrus-ci.com/task/6228217159221248
Thanks for testing it on Windows.
It seems like the path needs to be cleaned on Windows. I checked
Cluster.pm and created a version of that code and added that to the test
that should work. See attached patch.
I noted that many of the paths are not platform-agnostic. It an idea to
switch to use something like File::Spec instead and build paths using
that, but it's out of scope for this patch.
Best wishes,
Mats Kindahl, Multigres Engineer, Supabase
| Attachment | Content-Type | Size |
|---|---|---|
| v6.0001-pg_rewind-use-UUIDs-to-detect-independent-same-TLI-p.patch | text/x-patch | 37.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Eisentraut | 2026-05-30 20:51:16 | Re: Heads Up: cirrus-ci is shutting down June 1st |
| Previous Message | Bruce Momjian | 2026-05-30 20:16:11 | Re: should we have a fast-path planning for OLTP starjoins? |