Quick Links

Re: pg_waldump: support decoding of WAL inside tarfile

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Amul Sul <sulamul(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: pg_waldump: support decoding of WAL inside tarfile
Date:	2026-03-06 22:21:09
Message-ID:	72973471-8946-40c7-8b2d-8f95540d90e2@dunslane.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2026-03-04 We 4:50 PM, Andrew Dunstan wrote:
>
>
> On 2026-03-04 We 7:52 AM, Amul Sul wrote:
>> On Wed, Mar 4, 2026 at 6:07 AM Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>>> On 2026-03-02 Mo 8:00 AM, Amul Sul wrote:
>>>> On Wed, Feb 18, 2026 at 12:28 PM Amul Sul<sulamul(at)gmail(dot)com> wrote:
>>>>> On Tue, Feb 10, 2026 at 3:06 PM Amul Sul<sulamul(at)gmail(dot)com> wrote:
>>>>>> On Wed, Feb 4, 2026 at 6:39 PM Amul Sul<sulamul(at)gmail(dot)com> wrote:
>>>>>>> On Wed, Jan 28, 2026 at 2:41 AM Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>>>>>>>> On Tue, Jan 27, 2026 at 7:07 AM Amul Sul<sulamul(at)gmail(dot)com> wrote:
>>>>>>>>> In the attached version, I am using the WAL segment name as the hash
>>>>>>>>> key, which is much more straightforward. I have rewritten
>>>>>>>>> read_archive_wal_page(), and it looks much cleaner than before. The
>>>>>>>>> logic to discard irrelevant WAL files is still within
>>>>>>>>> get_archive_wal_entry. I added an explanation for setting cur_wal to
>>>>>>>>> NULL, which is now handled in the separate function I mentioned
>>>>>>>>> previously.
>>>>>>>>>
>>>>>>>>> Kindly have a look at the attached version; let me know if you are
>>>>>>>>> still not happy with the current approach for filtering/discarding
>>>>>>>>> irrelevant WAL segments. It isn't much different from the previous
>>>>>>>>> version, but I have tried to keep it in a separate routine for better
>>>>>>>>> code readability, with comments to make it easier to understand. I
>>>>>>>>> also added a comment for ArchivedWALFile.
>>>>>>>> I feel like the division of labor between get_archive_wal_entry() and
>>>>>>>> read_archive_wal_page() is odd. I noticed this in the last version,
>>>>>>>> too, and it still seems to be the case. get_archive_wal_entry() first
>>>>>>>> calls ArchivedWAL_lookup(). If that finds an entry, it just returns.
>>>>>>>> If it doesn't, it loops until an entry for the requested file shows up
>>>>>>>> and then returns it. Then control returns to read_archive_wal_page()
>>>>>>>> which loops some more until we have all the data we need for the
>>>>>>>> requested file. But it seems odd to me to have two separate loops
>>>>>>>> here. I think that the first loop is going to call read_archive_file()
>>>>>>>> until we find the beginning of the file that we care about and then
>>>>>>>> the second one is going to call read_archive_file() some more until we
>>>>>>>> have read enough of it to satisfy the request. It feels odd to me to
>>>>>>>> do it that way, as if we told somebody to first wait until 9 o'clock
>>>>>>>> and then wait another 30 minutes, instead of just telling them to wait
>>>>>>>> until 9:30. I realize it's not quite the same thing, because apart
>>>>>>>> from calling read_archive_file(), the two loops do different things,
>>>>>>>> but I still think it looks odd.
>>>>>>>>
>>>>>>>> + /*
>>>>>>>> + * Ignore if the timeline is different or the current segment is not
>>>>>>>> + * the desired one.
>>>>>>>> + */
>>>>>>>> + XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz);
>>>>>>>> + if (privateInfo->timeline != curSegTimeline ||
>>>>>>>> + privateInfo->startSegNo > curSegNo ||
>>>>>>>> + privateInfo->endSegNo < curSegNo ||
>>>>>>>> + segno > curSegNo)
>>>>>>>> + {
>>>>>>>> + free_archive_wal_entry(entry->fname, privateInfo);
>>>>>>>> + continue;
>>>>>>>> + }
>>>>>>>>
>>>>>>>> The comment doesn't match the code. If it did, the test would be
>>>>>>>> (privateInfo->timeline != curSegTimeline || segno != curSegno). But
>>>>>>>> instead the segno test is > rather than !=, and the checks against
>>>>>>>> startSegNo and endSegNo aren't explained at all. I think I understand
>>>>>>>> why the segno test uses > rather than !=, but it's the point of the
>>>>>>>> comment to explain things like that, rather than leaving the reader to
>>>>>>>> guess. And I don't know why we also need to test startSegNo and
>>>>>>>> endSegNo.
>>>>>>>>
>>>>>>>> I also wonder what the point is of doing XLogFromFileName() on the
>>>>>>>> fname provided by the caller and then again on entry->fname. Couldn't
>>>>>>>> you just compare the strings?
>>>>>>>>
>>>>>>>> Again, the division of labor is really odd here. It's the job of
>>>>>>>> astreamer_waldump_content() to skip things that aren't WAL files at
>>>>>>>> all, but it's the job of get_archive_wal_entry() to skip things that
>>>>>>>> are WAL files but not the one we want. I disagree with putting those
>>>>>>>> checks in completely separate parts of the code.
>>>>>>>>
>>>>>>> Keeping the timeline and segment start-end range checks inside the
>>>>>>> archive streamer creates a circular dependency that cannot be resolved
>>>>>>> without a 'dirty hack'. We must read the first available WAL file page
>>>>>>> to determine the wal_segment_size before it can calculate the target
>>>>>>> segment range. Moving the checks inside the streamer would make it
>>>>>>> impossible to process that initial file, as the necessary filtering
>>>>>>> parameters -- would still be unknown which would need to be skipped
>>>>>>> for the first read somehow. What if later we realized that the first
>>>>>>> WAL file which was allowed to be streamed by skipping that check is
>>>>>>> irrelevant and doesn't fall under the start-end segment range?
>>>>>>>
>>>>>> Please have a look at the attached version, specifically patch 0005.
>>>>>> In astreamer_waldump_content(), I have moved the WAL file filtration
>>>>>> check from get_archive_wal_entry(). This check will be skipped during
>>>>>> the initial read in init_archive_reader(), which instead performs it
>>>>>> explicitly once it determines the WAL segment size and the start/end
>>>>>> segments.
>>>>>>
>>>>>> To access the WAL segment size inside astreamer_waldump_content(), I
>>>>>> have moved the WAL segment size variable into the XLogDumpPrivate
>>>>>> structure in the separate 0004 patch.
>>>>> Attached is an updated version including the aforesaid changes. It
>>>>> includes a new refactoring patch (0001) that moves the logic for
>>>>> identifying tar archives and their compression types from
>>>>> pg_basebackup and pg_verifybackup into a separate-reusable function,
>>>>> per a suggestion from Euler [1]. Additionally, I have added a test
>>>>> for the contrecord decoding to the main patch (now 0006).
>>>>>
>>>>> 1]http://postgr.es/m/3c8e7b02-2152-495a-a0b6-e37cf9286a70@app.fastmail.com
>>>>>
>>>> Rebased against the latest master, fixed typos in code comments, and
>>>> replaced palloc0 with palloc0_object.
>>>>
>>> Hi Amul.
>>>
>>>
>>> I think this looks in pretty good shape.
>>>
>> Thank you very much for looking at the patch.
>>
>>> Attached are patches for a few things I think could be fixed. They are
>>> mostly self-explanatory. The TAP test fix is the only sane way I could
>>> come up with stopping the skip code you had from reporting a wildly
>>> inaccurate number of tests skipped. The sane way to do this from a
>>> Test::More perspective is a subtest, but unfortunately meson does not
>>> like subtest output, which is why we don't use it elsewhere, so the only
>>> way I could come up with was to split this out into a separate test. Of
>>> course, we might just say we don't care about the misreport, in which
>>> case we could just live with things as they are.
>>>
>> I agree that the reported skip number was incorrect, and I have
>> corrected it in the attached patch. I haven't applied your patch for
>> the TAP test improvements yet because I wanted to double-check it
>> first with you; the patch as it stood created duplicate tests already
>> present in 001_basic.pl. To avoid this duplication, I have added a
>> loop that performs tests for both plain and tar WAL directory inputs,
>> similar to the approach used in pg_verifybackup for different
>> compression type tests (e.g., 008_untar.pl, 010_client_untar.pl). I
>> don't have any objection to doing so if you feel the duplication is
>> acceptable, but I feel that using a loop for the tests in 001_basic.pl
>> is a bit tidier. Let me know your thoughts.
>
>
> I will take a look.
>

I'm ok, with doing it this way. It's just a bit fragile - if we add a
test the number will be wrong. But maybe it's not worth worrying about.

Everything else looks fairly good. The attached fixes a few relatively
minor issues in v15. The main one is that it stops allocating/freeing a
buffer every time we call read_archive_file() and instead adds a
reusable buffer. It also adds back wal-directory as an undocumented
alias of wal-path, to avoid breaking legacy scripts unnecessarily, and
adds constness to the fname argument of pg_tar_compress_algorithm, as
well as fixing some indentation and grammar issues.

All in all I think we're in good shape.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Attachment	Content-Type	Size
cf5955-fixes.patch.no-cfbot	text/plain	7.1 KB

In response to

Re: pg_waldump: support decoding of WAL inside tarfile at 2026-03-04 21:50:30 from Andrew Dunstan

Responses

Re: pg_waldump: support decoding of WAL inside tarfile at 2026-03-09 12:26:50 from Amul Sul

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Zsolt Parragi	2026-03-06 22:44:43	Re: [oauth] Stabilize the libpq-oauth ABI (and allow alternative implementations?)
Previous Message	Nathan Bossart	2026-03-06 22:15:12	Re: enhance wraparound warnings