Re: pg_waldump: support decoding of WAL inside tarfile

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Amul Sul <sulamul(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
Date: 2026-03-04 00:37:48
Message-ID: dd8f7a20-f692-47cf-be79-fd6b54d4a1a6@dunslane.net
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2026-03-02 Mo 8:00 AM, Amul Sul wrote:
> On Wed, Feb 18, 2026 at 12:28 PM Amul Sul <sulamul(at)gmail(dot)com> wrote:
>> On Tue, Feb 10, 2026 at 3:06 PM Amul Sul <sulamul(at)gmail(dot)com> wrote:
>>> On Wed, Feb 4, 2026 at 6:39 PM Amul Sul <sulamul(at)gmail(dot)com> wrote:
>>>> On Wed, Jan 28, 2026 at 2:41 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>>> On Tue, Jan 27, 2026 at 7:07 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
>>>>>> In the attached version, I am using the WAL segment name as the hash
>>>>>> key, which is much more straightforward. I have rewritten
>>>>>> read_archive_wal_page(), and it looks much cleaner than before. The
>>>>>> logic to discard irrelevant WAL files is still within
>>>>>> get_archive_wal_entry. I added an explanation for setting cur_wal to
>>>>>> NULL, which is now handled in the separate function I mentioned
>>>>>> previously.
>>>>>>
>>>>>> Kindly have a look at the attached version; let me know if you are
>>>>>> still not happy with the current approach for filtering/discarding
>>>>>> irrelevant WAL segments. It isn't much different from the previous
>>>>>> version, but I have tried to keep it in a separate routine for better
>>>>>> code readability, with comments to make it easier to understand. I
>>>>>> also added a comment for ArchivedWALFile.
>>>>> I feel like the division of labor between get_archive_wal_entry() and
>>>>> read_archive_wal_page() is odd. I noticed this in the last version,
>>>>> too, and it still seems to be the case. get_archive_wal_entry() first
>>>>> calls ArchivedWAL_lookup(). If that finds an entry, it just returns.
>>>>> If it doesn't, it loops until an entry for the requested file shows up
>>>>> and then returns it. Then control returns to read_archive_wal_page()
>>>>> which loops some more until we have all the data we need for the
>>>>> requested file. But it seems odd to me to have two separate loops
>>>>> here. I think that the first loop is going to call read_archive_file()
>>>>> until we find the beginning of the file that we care about and then
>>>>> the second one is going to call read_archive_file() some more until we
>>>>> have read enough of it to satisfy the request. It feels odd to me to
>>>>> do it that way, as if we told somebody to first wait until 9 o'clock
>>>>> and then wait another 30 minutes, instead of just telling them to wait
>>>>> until 9:30. I realize it's not quite the same thing, because apart
>>>>> from calling read_archive_file(), the two loops do different things,
>>>>> but I still think it looks odd.
>>>>>
>>>>> + /*
>>>>> + * Ignore if the timeline is different or the current segment is not
>>>>> + * the desired one.
>>>>> + */
>>>>> + XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz);
>>>>> + if (privateInfo->timeline != curSegTimeline ||
>>>>> + privateInfo->startSegNo > curSegNo ||
>>>>> + privateInfo->endSegNo < curSegNo ||
>>>>> + segno > curSegNo)
>>>>> + {
>>>>> + free_archive_wal_entry(entry->fname, privateInfo);
>>>>> + continue;
>>>>> + }
>>>>>
>>>>> The comment doesn't match the code. If it did, the test would be
>>>>> (privateInfo->timeline != curSegTimeline || segno != curSegno). But
>>>>> instead the segno test is > rather than !=, and the checks against
>>>>> startSegNo and endSegNo aren't explained at all. I think I understand
>>>>> why the segno test uses > rather than !=, but it's the point of the
>>>>> comment to explain things like that, rather than leaving the reader to
>>>>> guess. And I don't know why we also need to test startSegNo and
>>>>> endSegNo.
>>>>>
>>>>> I also wonder what the point is of doing XLogFromFileName() on the
>>>>> fname provided by the caller and then again on entry->fname. Couldn't
>>>>> you just compare the strings?
>>>>>
>>>>> Again, the division of labor is really odd here. It's the job of
>>>>> astreamer_waldump_content() to skip things that aren't WAL files at
>>>>> all, but it's the job of get_archive_wal_entry() to skip things that
>>>>> are WAL files but not the one we want. I disagree with putting those
>>>>> checks in completely separate parts of the code.
>>>>>
>>>> Keeping the timeline and segment start-end range checks inside the
>>>> archive streamer creates a circular dependency that cannot be resolved
>>>> without a 'dirty hack'. We must read the first available WAL file page
>>>> to determine the wal_segment_size before it can calculate the target
>>>> segment range. Moving the checks inside the streamer would make it
>>>> impossible to process that initial file, as the necessary filtering
>>>> parameters -- would still be unknown which would need to be skipped
>>>> for the first read somehow. What if later we realized that the first
>>>> WAL file which was allowed to be streamed by skipping that check is
>>>> irrelevant and doesn't fall under the start-end segment range?
>>>>
>>> Please have a look at the attached version, specifically patch 0005.
>>> In astreamer_waldump_content(), I have moved the WAL file filtration
>>> check from get_archive_wal_entry(). This check will be skipped during
>>> the initial read in init_archive_reader(), which instead performs it
>>> explicitly once it determines the WAL segment size and the start/end
>>> segments.
>>>
>>> To access the WAL segment size inside astreamer_waldump_content(), I
>>> have moved the WAL segment size variable into the XLogDumpPrivate
>>> structure in the separate 0004 patch.
>> Attached is an updated version including the aforesaid changes. It
>> includes a new refactoring patch (0001) that moves the logic for
>> identifying tar archives and their compression types from
>> pg_basebackup and pg_verifybackup into a separate-reusable function,
>> per a suggestion from Euler [1]. Additionally, I have added a test
>> for the contrecord decoding to the main patch (now 0006).
>>
>> 1] http://postgr.es/m/3c8e7b02-2152-495a-a0b6-e37cf9286a70@app.fastmail.com
>>
> Rebased against the latest master, fixed typos in code comments, and
> replaced palloc0 with palloc0_object.
>

Hi Amul.

I think this looks in pretty good shape.

Attached are patches for a few things I think could be fixed. They are
mostly self-explanatory. The TAP test fix is the only sane way I could
come up with stopping the skip code you had from reporting a wildly
inaccurate number of tests skipped. The sane way to do this from a
Test::More perspective is a subtest, but unfortunately meson does not
like subtest output, which is why we don't use it elsewhere, so the only
way I could come up with was to split this out into a separate test. Of
course, we might just say we don't care about the misreport, in which
case we could just live with things as they are.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Attachment Content-Type Size
cf5955-tar-wal-test.patch.no-cfbot text/plain 1.4 KB
cf5955-tap-test-fix.patch.no-cfbot text/plain 17.5 KB
cf5955-docs.patch.no-cfbot text/plain 2.4 KB
cf5955-fixes.patch.no-cfbot text/plain 3.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2026-03-04 00:44:17 Re: Is it OK to perform logging while holding a LWLock?
Previous Message Paul Bunn 2026-03-04 00:36:09 [BUG + PATCH] DSA pagemap out-of-bounds in make_new_segment odd-sized path