| From: | Amul Sul <sulamul(at)gmail(dot)com> |
|---|---|
| To: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
| Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: pg_waldump: support decoding of WAL inside tarfile |
| Date: | 2026-03-04 12:52:00 |
| Message-ID: | CAAJ_b95k4yWsVSbptTS0SF3thZBuQURvFCCAKwwz-0yyZhbnTQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Mar 4, 2026 at 6:07 AM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> On 2026-03-02 Mo 8:00 AM, Amul Sul wrote:
> > On Wed, Feb 18, 2026 at 12:28 PM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> >> On Tue, Feb 10, 2026 at 3:06 PM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> >>> On Wed, Feb 4, 2026 at 6:39 PM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> >>>> On Wed, Jan 28, 2026 at 2:41 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >>>>> On Tue, Jan 27, 2026 at 7:07 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> >>>>>> In the attached version, I am using the WAL segment name as the hash
> >>>>>> key, which is much more straightforward. I have rewritten
> >>>>>> read_archive_wal_page(), and it looks much cleaner than before. The
> >>>>>> logic to discard irrelevant WAL files is still within
> >>>>>> get_archive_wal_entry. I added an explanation for setting cur_wal to
> >>>>>> NULL, which is now handled in the separate function I mentioned
> >>>>>> previously.
> >>>>>>
> >>>>>> Kindly have a look at the attached version; let me know if you are
> >>>>>> still not happy with the current approach for filtering/discarding
> >>>>>> irrelevant WAL segments. It isn't much different from the previous
> >>>>>> version, but I have tried to keep it in a separate routine for better
> >>>>>> code readability, with comments to make it easier to understand. I
> >>>>>> also added a comment for ArchivedWALFile.
> >>>>> I feel like the division of labor between get_archive_wal_entry() and
> >>>>> read_archive_wal_page() is odd. I noticed this in the last version,
> >>>>> too, and it still seems to be the case. get_archive_wal_entry() first
> >>>>> calls ArchivedWAL_lookup(). If that finds an entry, it just returns.
> >>>>> If it doesn't, it loops until an entry for the requested file shows up
> >>>>> and then returns it. Then control returns to read_archive_wal_page()
> >>>>> which loops some more until we have all the data we need for the
> >>>>> requested file. But it seems odd to me to have two separate loops
> >>>>> here. I think that the first loop is going to call read_archive_file()
> >>>>> until we find the beginning of the file that we care about and then
> >>>>> the second one is going to call read_archive_file() some more until we
> >>>>> have read enough of it to satisfy the request. It feels odd to me to
> >>>>> do it that way, as if we told somebody to first wait until 9 o'clock
> >>>>> and then wait another 30 minutes, instead of just telling them to wait
> >>>>> until 9:30. I realize it's not quite the same thing, because apart
> >>>>> from calling read_archive_file(), the two loops do different things,
> >>>>> but I still think it looks odd.
> >>>>>
> >>>>> + /*
> >>>>> + * Ignore if the timeline is different or the current segment is not
> >>>>> + * the desired one.
> >>>>> + */
> >>>>> + XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz);
> >>>>> + if (privateInfo->timeline != curSegTimeline ||
> >>>>> + privateInfo->startSegNo > curSegNo ||
> >>>>> + privateInfo->endSegNo < curSegNo ||
> >>>>> + segno > curSegNo)
> >>>>> + {
> >>>>> + free_archive_wal_entry(entry->fname, privateInfo);
> >>>>> + continue;
> >>>>> + }
> >>>>>
> >>>>> The comment doesn't match the code. If it did, the test would be
> >>>>> (privateInfo->timeline != curSegTimeline || segno != curSegno). But
> >>>>> instead the segno test is > rather than !=, and the checks against
> >>>>> startSegNo and endSegNo aren't explained at all. I think I understand
> >>>>> why the segno test uses > rather than !=, but it's the point of the
> >>>>> comment to explain things like that, rather than leaving the reader to
> >>>>> guess. And I don't know why we also need to test startSegNo and
> >>>>> endSegNo.
> >>>>>
> >>>>> I also wonder what the point is of doing XLogFromFileName() on the
> >>>>> fname provided by the caller and then again on entry->fname. Couldn't
> >>>>> you just compare the strings?
> >>>>>
> >>>>> Again, the division of labor is really odd here. It's the job of
> >>>>> astreamer_waldump_content() to skip things that aren't WAL files at
> >>>>> all, but it's the job of get_archive_wal_entry() to skip things that
> >>>>> are WAL files but not the one we want. I disagree with putting those
> >>>>> checks in completely separate parts of the code.
> >>>>>
> >>>> Keeping the timeline and segment start-end range checks inside the
> >>>> archive streamer creates a circular dependency that cannot be resolved
> >>>> without a 'dirty hack'. We must read the first available WAL file page
> >>>> to determine the wal_segment_size before it can calculate the target
> >>>> segment range. Moving the checks inside the streamer would make it
> >>>> impossible to process that initial file, as the necessary filtering
> >>>> parameters -- would still be unknown which would need to be skipped
> >>>> for the first read somehow. What if later we realized that the first
> >>>> WAL file which was allowed to be streamed by skipping that check is
> >>>> irrelevant and doesn't fall under the start-end segment range?
> >>>>
> >>> Please have a look at the attached version, specifically patch 0005.
> >>> In astreamer_waldump_content(), I have moved the WAL file filtration
> >>> check from get_archive_wal_entry(). This check will be skipped during
> >>> the initial read in init_archive_reader(), which instead performs it
> >>> explicitly once it determines the WAL segment size and the start/end
> >>> segments.
> >>>
> >>> To access the WAL segment size inside astreamer_waldump_content(), I
> >>> have moved the WAL segment size variable into the XLogDumpPrivate
> >>> structure in the separate 0004 patch.
> >> Attached is an updated version including the aforesaid changes. It
> >> includes a new refactoring patch (0001) that moves the logic for
> >> identifying tar archives and their compression types from
> >> pg_basebackup and pg_verifybackup into a separate-reusable function,
> >> per a suggestion from Euler [1]. Additionally, I have added a test
> >> for the contrecord decoding to the main patch (now 0006).
> >>
> >> 1] http://postgr.es/m/3c8e7b02-2152-495a-a0b6-e37cf9286a70@app.fastmail.com
> >>
> > Rebased against the latest master, fixed typos in code comments, and
> > replaced palloc0 with palloc0_object.
> >
>
> Hi Amul.
>
>
> I think this looks in pretty good shape.
>
Thank you very much for looking at the patch.
> Attached are patches for a few things I think could be fixed. They are
> mostly self-explanatory. The TAP test fix is the only sane way I could
> come up with stopping the skip code you had from reporting a wildly
> inaccurate number of tests skipped. The sane way to do this from a
> Test::More perspective is a subtest, but unfortunately meson does not
> like subtest output, which is why we don't use it elsewhere, so the only
> way I could come up with was to split this out into a separate test. Of
> course, we might just say we don't care about the misreport, in which
> case we could just live with things as they are.
>
I agree that the reported skip number was incorrect, and I have
corrected it in the attached patch. I haven't applied your patch for
the TAP test improvements yet because I wanted to double-check it
first with you; the patch as it stood created duplicate tests already
present in 001_basic.pl. To avoid this duplication, I have added a
loop that performs tests for both plain and tar WAL directory inputs,
similar to the approach used in pg_verifybackup for different
compression type tests (e.g., 008_untar.pl, 010_client_untar.pl). I
don't have any objection to doing so if you feel the duplication is
acceptable, but I feel that using a loop for the tests in 001_basic.pl
is a bit tidier. Let me know your thoughts.
I have applied all your other patches but skipped the changes to
pg_verifybackup.c from cf5955-fixes.patch.no-cfbot, as they seem
unrelated or perhaps I have misunderstood them.
Regards,
Amul
| Attachment | Content-Type | Size |
|---|---|---|
| v15-0001-Refactor-Move-tar-archive-parsing-into-a-common-.patch | application/x-patch | 6.7 KB |
| v15-0002-Refactor-pg_waldump-Move-some-declarations-to-ne.patch | application/x-patch | 2.2 KB |
| v15-0003-Refactor-pg_waldump-Separate-logic-used-to-calcu.patch | application/x-patch | 2.4 KB |
| v15-0004-Refactor-pg_waldump-Restructure-TAP-tests.patch | application/x-patch | 6.6 KB |
| v15-0005-Refactor-pg_waldump-Move-WAL-segment-size-to-XLo.patch | application/x-patch | 5.1 KB |
| v15-0006-pg_waldump-Add-support-for-archived-WAL-decoding.patch | application/x-patch | 41.8 KB |
| v15-0007-pg_waldump-Remove-the-restriction-on-the-order-o.patch | application/x-patch | 13.8 KB |
| v15-0008-pg_verifybackup-Delay-default-WAL-directory-prep.patch | application/x-patch | 1.7 KB |
| v15-0009-pg_verifybackup-Rename-the-wal-directory-switch-.patch | application/x-patch | 5.9 KB |
| v15-0010-pg_verifybackup-Enabled-WAL-parsing-for-tar-form.patch | application/x-patch | 11.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Akshay Joshi | 2026-03-04 12:59:53 | Re: [PATCH] Add pg_get_database_ddl() function to reconstruct CREATE DATABASE statement |
| Previous Message | Matheus Alcantara | 2026-03-04 12:17:21 | Re: postgres_fdw: Use COPY to speed up batch inserts |