Re: pg_waldump: support decoding of WAL inside tarfile

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
Date: 2026-03-09 12:26:50
Message-ID: CAAJ_b94hkpAJ-Q8FKjGgpbRTmmXOd-agKo1Eii4yOH2N++N36Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 7, 2026 at 3:51 AM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> On 2026-03-04 We 4:50 PM, Andrew Dunstan wrote:
> >
> >
> > On 2026-03-04 We 7:52 AM, Amul Sul wrote:
> >> On Wed, Mar 4, 2026 at 6:07 AM Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
> >>> On 2026-03-02 Mo 8:00 AM, Amul Sul wrote:
> >>>> On Wed, Feb 18, 2026 at 12:28 PM Amul Sul<sulamul(at)gmail(dot)com> wrote:
> >>>>> On Tue, Feb 10, 2026 at 3:06 PM Amul Sul<sulamul(at)gmail(dot)com> wrote:
> >>>>>> On Wed, Feb 4, 2026 at 6:39 PM Amul Sul<sulamul(at)gmail(dot)com> wrote:
> >>>>>>> On Wed, Jan 28, 2026 at 2:41 AM Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
> >>>>>>>> On Tue, Jan 27, 2026 at 7:07 AM Amul Sul<sulamul(at)gmail(dot)com> wrote:
> >>>>>>>>> In the attached version, I am using the WAL segment name as the hash
> >>>>>>>>> key, which is much more straightforward. I have rewritten
> >>>>>>>>> read_archive_wal_page(), and it looks much cleaner than before. The
> >>>>>>>>> logic to discard irrelevant WAL files is still within
> >>>>>>>>> get_archive_wal_entry. I added an explanation for setting cur_wal to
> >>>>>>>>> NULL, which is now handled in the separate function I mentioned
> >>>>>>>>> previously.
> >>>>>>>>>
> >>>>>>>>> Kindly have a look at the attached version; let me know if you are
> >>>>>>>>> still not happy with the current approach for filtering/discarding
> >>>>>>>>> irrelevant WAL segments. It isn't much different from the previous
> >>>>>>>>> version, but I have tried to keep it in a separate routine for better
> >>>>>>>>> code readability, with comments to make it easier to understand. I
> >>>>>>>>> also added a comment for ArchivedWALFile.
> >>>>>>>> I feel like the division of labor between get_archive_wal_entry() and
> >>>>>>>> read_archive_wal_page() is odd. I noticed this in the last version,
> >>>>>>>> too, and it still seems to be the case. get_archive_wal_entry() first
> >>>>>>>> calls ArchivedWAL_lookup(). If that finds an entry, it just returns.
> >>>>>>>> If it doesn't, it loops until an entry for the requested file shows up
> >>>>>>>> and then returns it. Then control returns to read_archive_wal_page()
> >>>>>>>> which loops some more until we have all the data we need for the
> >>>>>>>> requested file. But it seems odd to me to have two separate loops
> >>>>>>>> here. I think that the first loop is going to call read_archive_file()
> >>>>>>>> until we find the beginning of the file that we care about and then
> >>>>>>>> the second one is going to call read_archive_file() some more until we
> >>>>>>>> have read enough of it to satisfy the request. It feels odd to me to
> >>>>>>>> do it that way, as if we told somebody to first wait until 9 o'clock
> >>>>>>>> and then wait another 30 minutes, instead of just telling them to wait
> >>>>>>>> until 9:30. I realize it's not quite the same thing, because apart
> >>>>>>>> from calling read_archive_file(), the two loops do different things,
> >>>>>>>> but I still think it looks odd.
> >>>>>>>>
> >>>>>>>> + /*
> >>>>>>>> + * Ignore if the timeline is different or the current segment is not
> >>>>>>>> + * the desired one.
> >>>>>>>> + */
> >>>>>>>> + XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz);
> >>>>>>>> + if (privateInfo->timeline != curSegTimeline ||
> >>>>>>>> + privateInfo->startSegNo > curSegNo ||
> >>>>>>>> + privateInfo->endSegNo < curSegNo ||
> >>>>>>>> + segno > curSegNo)
> >>>>>>>> + {
> >>>>>>>> + free_archive_wal_entry(entry->fname, privateInfo);
> >>>>>>>> + continue;
> >>>>>>>> + }
> >>>>>>>>
> >>>>>>>> The comment doesn't match the code. If it did, the test would be
> >>>>>>>> (privateInfo->timeline != curSegTimeline || segno != curSegno). But
> >>>>>>>> instead the segno test is > rather than !=, and the checks against
> >>>>>>>> startSegNo and endSegNo aren't explained at all. I think I understand
> >>>>>>>> why the segno test uses > rather than !=, but it's the point of the
> >>>>>>>> comment to explain things like that, rather than leaving the reader to
> >>>>>>>> guess. And I don't know why we also need to test startSegNo and
> >>>>>>>> endSegNo.
> >>>>>>>>
> >>>>>>>> I also wonder what the point is of doing XLogFromFileName() on the
> >>>>>>>> fname provided by the caller and then again on entry->fname. Couldn't
> >>>>>>>> you just compare the strings?
> >>>>>>>>
> >>>>>>>> Again, the division of labor is really odd here. It's the job of
> >>>>>>>> astreamer_waldump_content() to skip things that aren't WAL files at
> >>>>>>>> all, but it's the job of get_archive_wal_entry() to skip things that
> >>>>>>>> are WAL files but not the one we want. I disagree with putting those
> >>>>>>>> checks in completely separate parts of the code.
> >>>>>>>>
> >>>>>>> Keeping the timeline and segment start-end range checks inside the
> >>>>>>> archive streamer creates a circular dependency that cannot be resolved
> >>>>>>> without a 'dirty hack'. We must read the first available WAL file page
> >>>>>>> to determine the wal_segment_size before it can calculate the target
> >>>>>>> segment range. Moving the checks inside the streamer would make it
> >>>>>>> impossible to process that initial file, as the necessary filtering
> >>>>>>> parameters -- would still be unknown which would need to be skipped
> >>>>>>> for the first read somehow. What if later we realized that the first
> >>>>>>> WAL file which was allowed to be streamed by skipping that check is
> >>>>>>> irrelevant and doesn't fall under the start-end segment range?
> >>>>>>>
> >>>>>> Please have a look at the attached version, specifically patch 0005.
> >>>>>> In astreamer_waldump_content(), I have moved the WAL file filtration
> >>>>>> check from get_archive_wal_entry(). This check will be skipped during
> >>>>>> the initial read in init_archive_reader(), which instead performs it
> >>>>>> explicitly once it determines the WAL segment size and the start/end
> >>>>>> segments.
> >>>>>>
> >>>>>> To access the WAL segment size inside astreamer_waldump_content(), I
> >>>>>> have moved the WAL segment size variable into the XLogDumpPrivate
> >>>>>> structure in the separate 0004 patch.
> >>>>> Attached is an updated version including the aforesaid changes. It
> >>>>> includes a new refactoring patch (0001) that moves the logic for
> >>>>> identifying tar archives and their compression types from
> >>>>> pg_basebackup and pg_verifybackup into a separate-reusable function,
> >>>>> per a suggestion from Euler [1]. Additionally, I have added a test
> >>>>> for the contrecord decoding to the main patch (now 0006).
> >>>>>
> >>>>> 1]http://postgr.es/m/3c8e7b02-2152-495a-a0b6-e37cf9286a70@app.fastmail.com
> >>>>>
> >>>> Rebased against the latest master, fixed typos in code comments, and
> >>>> replaced palloc0 with palloc0_object.
> >>>>
> >>> Hi Amul.
> >>>
> >>>
> >>> I think this looks in pretty good shape.
> >>>
> >> Thank you very much for looking at the patch.
> >>
> >>> Attached are patches for a few things I think could be fixed. They are
> >>> mostly self-explanatory. The TAP test fix is the only sane way I could
> >>> come up with stopping the skip code you had from reporting a wildly
> >>> inaccurate number of tests skipped. The sane way to do this from a
> >>> Test::More perspective is a subtest, but unfortunately meson does not
> >>> like subtest output, which is why we don't use it elsewhere, so the only
> >>> way I could come up with was to split this out into a separate test. Of
> >>> course, we might just say we don't care about the misreport, in which
> >>> case we could just live with things as they are.
> >>>
> >> I agree that the reported skip number was incorrect, and I have
> >> corrected it in the attached patch. I haven't applied your patch for
> >> the TAP test improvements yet because I wanted to double-check it
> >> first with you; the patch as it stood created duplicate tests already
> >> present in 001_basic.pl. To avoid this duplication, I have added a
> >> loop that performs tests for both plain and tar WAL directory inputs,
> >> similar to the approach used in pg_verifybackup for different
> >> compression type tests (e.g., 008_untar.pl, 010_client_untar.pl). I
> >> don't have any objection to doing so if you feel the duplication is
> >> acceptable, but I feel that using a loop for the tests in 001_basic.pl
> >> is a bit tidier. Let me know your thoughts.
> >
> >
> > I will take a look.
> >
>
> I'm ok, with doing it this way. It's just a bit fragile - if we add a
> test the number will be wrong. But maybe it's not worth worrying about.
>
> Everything else looks fairly good. The attached fixes a few relatively
> minor issues in v15. The main one is that it stops allocating/freeing a
> buffer every time we call read_archive_file() and instead adds a
> reusable buffer. It also adds back wal-directory as an undocumented
> alias of wal-path, to avoid breaking legacy scripts unnecessarily, and
> adds constness to the fname argument of pg_tar_compress_algorithm, as
> well as fixing some indentation and grammar issues.
>
> All in all I think we're in good shape.

Thanks for the review. I have incorporated your suggested changes,
with one exception: I have skipped the buffer reallocation code in
read_archive_file(). Since we only handle two specific read sizes --
XLOG_BLCKSZ and READ_CHUNK_SIZE (128 KB, we defined in
archive_waldump.c) -- dynamic reallocation seems unnecessary. Instead,
I moved the allocation to init_archive_reader(), which now initializes
a buffer at READ_CHUNK_SIZE. I also added an assertion in
read_archive_file() to ensure that no read request exceeds this
allocated capacity.

Kindly have a look at the attached version and let me know your thoughts.

Regards,
Amul

Attachment Content-Type Size
v16-0001-Refactor-Move-tar-archive-parsing-into-a-common-.patch application/x-patch 6.7 KB
v16-0002-Refactor-pg_waldump-Move-some-declarations-to-ne.patch application/x-patch 2.2 KB
v16-0003-Refactor-pg_waldump-Separate-logic-used-to-calcu.patch application/x-patch 2.4 KB
v16-0004-Refactor-pg_waldump-Restructure-TAP-tests.patch application/x-patch 6.6 KB
v16-0005-Refactor-pg_waldump-Move-WAL-segment-size-to-XLo.patch application/x-patch 5.1 KB
v16-0006-pg_waldump-Add-support-for-archived-WAL-decoding.patch application/x-patch 42.5 KB
v16-0007-pg_waldump-Remove-the-restriction-on-the-order-o.patch application/x-patch 13.8 KB
v16-0008-pg_verifybackup-Delay-default-WAL-directory-prep.patch application/x-patch 1.7 KB
v16-0009-pg_verifybackup-Rename-the-wal-directory-switch-.patch application/x-patch 5.9 KB
v16-0010-pg_verifybackup-Enabled-WAL-parsing-for-tar-form.patch application/x-patch 11.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Álvaro Herrera 2026-03-09 12:40:51 Re: Avoid resource leak (src/bin/pg_dump/pg_dumpall.c)
Previous Message Andrew Dunstan 2026-03-09 12:23:09 Re: Emitting JSON to file using COPY TO