Re: pg_waldump: support decoding of WAL inside tarfile

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Amul Sul <sulamul(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
Date: 2026-03-11 17:08:04
Message-ID: 401bf08a-c8f1-48e2-9a30-78deaa9fa7c5@dunslane.net
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2026-03-09 Mo 8:26 AM, Amul Sul wrote:
> On Sat, Mar 7, 2026 at 3:51 AM Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>>
>> On 2026-03-04 We 4:50 PM, Andrew Dunstan wrote:
>>>
>>> On 2026-03-04 We 7:52 AM, Amul Sul wrote:
>>>> On Wed, Mar 4, 2026 at 6:07 AM Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>>>>> On 2026-03-02 Mo 8:00 AM, Amul Sul wrote:
>>>>>> On Wed, Feb 18, 2026 at 12:28 PM Amul Sul<sulamul(at)gmail(dot)com> wrote:
>>>>>>> On Tue, Feb 10, 2026 at 3:06 PM Amul Sul<sulamul(at)gmail(dot)com> wrote:
>>>>>>>> On Wed, Feb 4, 2026 at 6:39 PM Amul Sul<sulamul(at)gmail(dot)com> wrote:
>>>>>>>>> On Wed, Jan 28, 2026 at 2:41 AM Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>>>>>>>>>> On Tue, Jan 27, 2026 at 7:07 AM Amul Sul<sulamul(at)gmail(dot)com> wrote:
>>>>>>>>>>> In the attached version, I am using the WAL segment name as the hash
>>>>>>>>>>> key, which is much more straightforward. I have rewritten
>>>>>>>>>>> read_archive_wal_page(), and it looks much cleaner than before. The
>>>>>>>>>>> logic to discard irrelevant WAL files is still within
>>>>>>>>>>> get_archive_wal_entry. I added an explanation for setting cur_wal to
>>>>>>>>>>> NULL, which is now handled in the separate function I mentioned
>>>>>>>>>>> previously.
>>>>>>>>>>>
>>>>>>>>>>> Kindly have a look at the attached version; let me know if you are
>>>>>>>>>>> still not happy with the current approach for filtering/discarding
>>>>>>>>>>> irrelevant WAL segments. It isn't much different from the previous
>>>>>>>>>>> version, but I have tried to keep it in a separate routine for better
>>>>>>>>>>> code readability, with comments to make it easier to understand. I
>>>>>>>>>>> also added a comment for ArchivedWALFile.
>>>>>>>>>> I feel like the division of labor between get_archive_wal_entry() and
>>>>>>>>>> read_archive_wal_page() is odd. I noticed this in the last version,
>>>>>>>>>> too, and it still seems to be the case. get_archive_wal_entry() first
>>>>>>>>>> calls ArchivedWAL_lookup(). If that finds an entry, it just returns.
>>>>>>>>>> If it doesn't, it loops until an entry for the requested file shows up
>>>>>>>>>> and then returns it. Then control returns to read_archive_wal_page()
>>>>>>>>>> which loops some more until we have all the data we need for the
>>>>>>>>>> requested file. But it seems odd to me to have two separate loops
>>>>>>>>>> here. I think that the first loop is going to call read_archive_file()
>>>>>>>>>> until we find the beginning of the file that we care about and then
>>>>>>>>>> the second one is going to call read_archive_file() some more until we
>>>>>>>>>> have read enough of it to satisfy the request. It feels odd to me to
>>>>>>>>>> do it that way, as if we told somebody to first wait until 9 o'clock
>>>>>>>>>> and then wait another 30 minutes, instead of just telling them to wait
>>>>>>>>>> until 9:30. I realize it's not quite the same thing, because apart
>>>>>>>>>> from calling read_archive_file(), the two loops do different things,
>>>>>>>>>> but I still think it looks odd.
>>>>>>>>>>
>>>>>>>>>> + /*
>>>>>>>>>> + * Ignore if the timeline is different or the current segment is not
>>>>>>>>>> + * the desired one.
>>>>>>>>>> + */
>>>>>>>>>> + XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz);
>>>>>>>>>> + if (privateInfo->timeline != curSegTimeline ||
>>>>>>>>>> + privateInfo->startSegNo > curSegNo ||
>>>>>>>>>> + privateInfo->endSegNo < curSegNo ||
>>>>>>>>>> + segno > curSegNo)
>>>>>>>>>> + {
>>>>>>>>>> + free_archive_wal_entry(entry->fname, privateInfo);
>>>>>>>>>> + continue;
>>>>>>>>>> + }
>>>>>>>>>>
>>>>>>>>>> The comment doesn't match the code. If it did, the test would be
>>>>>>>>>> (privateInfo->timeline != curSegTimeline || segno != curSegno). But
>>>>>>>>>> instead the segno test is > rather than !=, and the checks against
>>>>>>>>>> startSegNo and endSegNo aren't explained at all. I think I understand
>>>>>>>>>> why the segno test uses > rather than !=, but it's the point of the
>>>>>>>>>> comment to explain things like that, rather than leaving the reader to
>>>>>>>>>> guess. And I don't know why we also need to test startSegNo and
>>>>>>>>>> endSegNo.
>>>>>>>>>>
>>>>>>>>>> I also wonder what the point is of doing XLogFromFileName() on the
>>>>>>>>>> fname provided by the caller and then again on entry->fname. Couldn't
>>>>>>>>>> you just compare the strings?
>>>>>>>>>>
>>>>>>>>>> Again, the division of labor is really odd here. It's the job of
>>>>>>>>>> astreamer_waldump_content() to skip things that aren't WAL files at
>>>>>>>>>> all, but it's the job of get_archive_wal_entry() to skip things that
>>>>>>>>>> are WAL files but not the one we want. I disagree with putting those
>>>>>>>>>> checks in completely separate parts of the code.
>>>>>>>>>>
>>>>>>>>> Keeping the timeline and segment start-end range checks inside the
>>>>>>>>> archive streamer creates a circular dependency that cannot be resolved
>>>>>>>>> without a 'dirty hack'. We must read the first available WAL file page
>>>>>>>>> to determine the wal_segment_size before it can calculate the target
>>>>>>>>> segment range. Moving the checks inside the streamer would make it
>>>>>>>>> impossible to process that initial file, as the necessary filtering
>>>>>>>>> parameters -- would still be unknown which would need to be skipped
>>>>>>>>> for the first read somehow. What if later we realized that the first
>>>>>>>>> WAL file which was allowed to be streamed by skipping that check is
>>>>>>>>> irrelevant and doesn't fall under the start-end segment range?
>>>>>>>>>
>>>>>>>> Please have a look at the attached version, specifically patch 0005.
>>>>>>>> In astreamer_waldump_content(), I have moved the WAL file filtration
>>>>>>>> check from get_archive_wal_entry(). This check will be skipped during
>>>>>>>> the initial read in init_archive_reader(), which instead performs it
>>>>>>>> explicitly once it determines the WAL segment size and the start/end
>>>>>>>> segments.
>>>>>>>>
>>>>>>>> To access the WAL segment size inside astreamer_waldump_content(), I
>>>>>>>> have moved the WAL segment size variable into the XLogDumpPrivate
>>>>>>>> structure in the separate 0004 patch.
>>>>>>> Attached is an updated version including the aforesaid changes. It
>>>>>>> includes a new refactoring patch (0001) that moves the logic for
>>>>>>> identifying tar archives and their compression types from
>>>>>>> pg_basebackup and pg_verifybackup into a separate-reusable function,
>>>>>>> per a suggestion from Euler [1]. Additionally, I have added a test
>>>>>>> for the contrecord decoding to the main patch (now 0006).
>>>>>>>
>>>>>>> 1]http://postgr.es/m/3c8e7b02-2152-495a-a0b6-e37cf9286a70@app.fastmail.com
>>>>>>>
>>>>>> Rebased against the latest master, fixed typos in code comments, and
>>>>>> replaced palloc0 with palloc0_object.
>>>>>>
>>>>> Hi Amul.
>>>>>
>>>>>
>>>>> I think this looks in pretty good shape.
>>>>>
>>>> Thank you very much for looking at the patch.
>>>>
>>>>> Attached are patches for a few things I think could be fixed. They are
>>>>> mostly self-explanatory. The TAP test fix is the only sane way I could
>>>>> come up with stopping the skip code you had from reporting a wildly
>>>>> inaccurate number of tests skipped. The sane way to do this from a
>>>>> Test::More perspective is a subtest, but unfortunately meson does not
>>>>> like subtest output, which is why we don't use it elsewhere, so the only
>>>>> way I could come up with was to split this out into a separate test. Of
>>>>> course, we might just say we don't care about the misreport, in which
>>>>> case we could just live with things as they are.
>>>>>
>>>> I agree that the reported skip number was incorrect, and I have
>>>> corrected it in the attached patch. I haven't applied your patch for
>>>> the TAP test improvements yet because I wanted to double-check it
>>>> first with you; the patch as it stood created duplicate tests already
>>>> present in 001_basic.pl. To avoid this duplication, I have added a
>>>> loop that performs tests for both plain and tar WAL directory inputs,
>>>> similar to the approach used in pg_verifybackup for different
>>>> compression type tests (e.g., 008_untar.pl, 010_client_untar.pl). I
>>>> don't have any objection to doing so if you feel the duplication is
>>>> acceptable, but I feel that using a loop for the tests in 001_basic.pl
>>>> is a bit tidier. Let me know your thoughts.
>>>
>>> I will take a look.
>>>
>> I'm ok, with doing it this way. It's just a bit fragile - if we add a
>> test the number will be wrong. But maybe it's not worth worrying about.
>>
>> Everything else looks fairly good. The attached fixes a few relatively
>> minor issues in v15. The main one is that it stops allocating/freeing a
>> buffer every time we call read_archive_file() and instead adds a
>> reusable buffer. It also adds back wal-directory as an undocumented
>> alias of wal-path, to avoid breaking legacy scripts unnecessarily, and
>> adds constness to the fname argument of pg_tar_compress_algorithm, as
>> well as fixing some indentation and grammar issues.
>>
>> All in all I think we're in good shape.
> Thanks for the review. I have incorporated your suggested changes,
> with one exception: I have skipped the buffer reallocation code in
> read_archive_file(). Since we only handle two specific read sizes --
> XLOG_BLCKSZ and READ_CHUNK_SIZE (128 KB, we defined in
> archive_waldump.c) -- dynamic reallocation seems unnecessary. Instead,
> I moved the allocation to init_archive_reader(), which now initializes
> a buffer at READ_CHUNK_SIZE. I also added an assertion in
> read_archive_file() to ensure that no read request exceeds this
> allocated capacity.
>
> Kindly have a look at the attached version and let me know your thoughts.
>

Looks pretty good. I have squashed them into three patches I think are
committable. Also attached is a diff showing what's changed - mainly this:

. --follow + tar archive rejected (pg_waldump.c) — new validation
prevents a confusing pg_fatal when combining --follow with a tar archive
. error messages split (archive_waldump.c) — the single "could not read
file" error is now two distinct messages: "WAL segment is too short"
(truncated file) vs "unexpected end of archive" (archive EOF) - Fixes an
issue raised in review
. hash table cleanup (archive_waldump.c) — free_archive_reader now
iterates and frees all remaining hash entries and destroys the table

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Attachment Content-Type Size
v17-0001-pg_waldump-Preparatory-refactoring-for-tar-archi.patch text/x-patch 20.8 KB
v17-0002-pg_waldump-Add-support-for-reading-WAL-from-tar-.patch text/x-patch 49.5 KB
v17-0003-pg_verifybackup-Enable-WAL-parsing-for-tar-forma.patch text/x-patch 16.0 KB
v17-changes.nocfbot text/plain 5.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2026-03-11 17:08:52 Re: another autovacuum scheduling thread
Previous Message Antonin Houska 2026-03-11 17:07:42 Re: Adding REPACK [concurrently]