| From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
|---|---|
| To: | Amul Sul <sulamul(at)gmail(dot)com> |
| Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: pg_waldump: support decoding of WAL inside tarfile |
| Date: | 2026-03-04 21:50:30 |
| Message-ID: | d84d43bb-6d9a-477e-93bd-1f61a93451cc@dunslane.net |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 2026-03-04 We 7:52 AM, Amul Sul wrote:
> On Wed, Mar 4, 2026 at 6:07 AM Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>>
>> On 2026-03-02 Mo 8:00 AM, Amul Sul wrote:
>>> On Wed, Feb 18, 2026 at 12:28 PM Amul Sul<sulamul(at)gmail(dot)com> wrote:
>>>> On Tue, Feb 10, 2026 at 3:06 PM Amul Sul<sulamul(at)gmail(dot)com> wrote:
>>>>> On Wed, Feb 4, 2026 at 6:39 PM Amul Sul<sulamul(at)gmail(dot)com> wrote:
>>>>>> On Wed, Jan 28, 2026 at 2:41 AM Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>>>>>>> On Tue, Jan 27, 2026 at 7:07 AM Amul Sul<sulamul(at)gmail(dot)com> wrote:
>>>>>>>> In the attached version, I am using the WAL segment name as the hash
>>>>>>>> key, which is much more straightforward. I have rewritten
>>>>>>>> read_archive_wal_page(), and it looks much cleaner than before. The
>>>>>>>> logic to discard irrelevant WAL files is still within
>>>>>>>> get_archive_wal_entry. I added an explanation for setting cur_wal to
>>>>>>>> NULL, which is now handled in the separate function I mentioned
>>>>>>>> previously.
>>>>>>>>
>>>>>>>> Kindly have a look at the attached version; let me know if you are
>>>>>>>> still not happy with the current approach for filtering/discarding
>>>>>>>> irrelevant WAL segments. It isn't much different from the previous
>>>>>>>> version, but I have tried to keep it in a separate routine for better
>>>>>>>> code readability, with comments to make it easier to understand. I
>>>>>>>> also added a comment for ArchivedWALFile.
>>>>>>> I feel like the division of labor between get_archive_wal_entry() and
>>>>>>> read_archive_wal_page() is odd. I noticed this in the last version,
>>>>>>> too, and it still seems to be the case. get_archive_wal_entry() first
>>>>>>> calls ArchivedWAL_lookup(). If that finds an entry, it just returns.
>>>>>>> If it doesn't, it loops until an entry for the requested file shows up
>>>>>>> and then returns it. Then control returns to read_archive_wal_page()
>>>>>>> which loops some more until we have all the data we need for the
>>>>>>> requested file. But it seems odd to me to have two separate loops
>>>>>>> here. I think that the first loop is going to call read_archive_file()
>>>>>>> until we find the beginning of the file that we care about and then
>>>>>>> the second one is going to call read_archive_file() some more until we
>>>>>>> have read enough of it to satisfy the request. It feels odd to me to
>>>>>>> do it that way, as if we told somebody to first wait until 9 o'clock
>>>>>>> and then wait another 30 minutes, instead of just telling them to wait
>>>>>>> until 9:30. I realize it's not quite the same thing, because apart
>>>>>>> from calling read_archive_file(), the two loops do different things,
>>>>>>> but I still think it looks odd.
>>>>>>>
>>>>>>> + /*
>>>>>>> + * Ignore if the timeline is different or the current segment is not
>>>>>>> + * the desired one.
>>>>>>> + */
>>>>>>> + XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz);
>>>>>>> + if (privateInfo->timeline != curSegTimeline ||
>>>>>>> + privateInfo->startSegNo > curSegNo ||
>>>>>>> + privateInfo->endSegNo < curSegNo ||
>>>>>>> + segno > curSegNo)
>>>>>>> + {
>>>>>>> + free_archive_wal_entry(entry->fname, privateInfo);
>>>>>>> + continue;
>>>>>>> + }
>>>>>>>
>>>>>>> The comment doesn't match the code. If it did, the test would be
>>>>>>> (privateInfo->timeline != curSegTimeline || segno != curSegno). But
>>>>>>> instead the segno test is > rather than !=, and the checks against
>>>>>>> startSegNo and endSegNo aren't explained at all. I think I understand
>>>>>>> why the segno test uses > rather than !=, but it's the point of the
>>>>>>> comment to explain things like that, rather than leaving the reader to
>>>>>>> guess. And I don't know why we also need to test startSegNo and
>>>>>>> endSegNo.
>>>>>>>
>>>>>>> I also wonder what the point is of doing XLogFromFileName() on the
>>>>>>> fname provided by the caller and then again on entry->fname. Couldn't
>>>>>>> you just compare the strings?
>>>>>>>
>>>>>>> Again, the division of labor is really odd here. It's the job of
>>>>>>> astreamer_waldump_content() to skip things that aren't WAL files at
>>>>>>> all, but it's the job of get_archive_wal_entry() to skip things that
>>>>>>> are WAL files but not the one we want. I disagree with putting those
>>>>>>> checks in completely separate parts of the code.
>>>>>>>
>>>>>> Keeping the timeline and segment start-end range checks inside the
>>>>>> archive streamer creates a circular dependency that cannot be resolved
>>>>>> without a 'dirty hack'. We must read the first available WAL file page
>>>>>> to determine the wal_segment_size before it can calculate the target
>>>>>> segment range. Moving the checks inside the streamer would make it
>>>>>> impossible to process that initial file, as the necessary filtering
>>>>>> parameters -- would still be unknown which would need to be skipped
>>>>>> for the first read somehow. What if later we realized that the first
>>>>>> WAL file which was allowed to be streamed by skipping that check is
>>>>>> irrelevant and doesn't fall under the start-end segment range?
>>>>>>
>>>>> Please have a look at the attached version, specifically patch 0005.
>>>>> In astreamer_waldump_content(), I have moved the WAL file filtration
>>>>> check from get_archive_wal_entry(). This check will be skipped during
>>>>> the initial read in init_archive_reader(), which instead performs it
>>>>> explicitly once it determines the WAL segment size and the start/end
>>>>> segments.
>>>>>
>>>>> To access the WAL segment size inside astreamer_waldump_content(), I
>>>>> have moved the WAL segment size variable into the XLogDumpPrivate
>>>>> structure in the separate 0004 patch.
>>>> Attached is an updated version including the aforesaid changes. It
>>>> includes a new refactoring patch (0001) that moves the logic for
>>>> identifying tar archives and their compression types from
>>>> pg_basebackup and pg_verifybackup into a separate-reusable function,
>>>> per a suggestion from Euler [1]. Additionally, I have added a test
>>>> for the contrecord decoding to the main patch (now 0006).
>>>>
>>>> 1]http://postgr.es/m/3c8e7b02-2152-495a-a0b6-e37cf9286a70@app.fastmail.com
>>>>
>>> Rebased against the latest master, fixed typos in code comments, and
>>> replaced palloc0 with palloc0_object.
>>>
>> Hi Amul.
>>
>>
>> I think this looks in pretty good shape.
>>
> Thank you very much for looking at the patch.
>
>> Attached are patches for a few things I think could be fixed. They are
>> mostly self-explanatory. The TAP test fix is the only sane way I could
>> come up with stopping the skip code you had from reporting a wildly
>> inaccurate number of tests skipped. The sane way to do this from a
>> Test::More perspective is a subtest, but unfortunately meson does not
>> like subtest output, which is why we don't use it elsewhere, so the only
>> way I could come up with was to split this out into a separate test. Of
>> course, we might just say we don't care about the misreport, in which
>> case we could just live with things as they are.
>>
> I agree that the reported skip number was incorrect, and I have
> corrected it in the attached patch. I haven't applied your patch for
> the TAP test improvements yet because I wanted to double-check it
> first with you; the patch as it stood created duplicate tests already
> present in 001_basic.pl. To avoid this duplication, I have added a
> loop that performs tests for both plain and tar WAL directory inputs,
> similar to the approach used in pg_verifybackup for different
> compression type tests (e.g., 008_untar.pl, 010_client_untar.pl). I
> don't have any objection to doing so if you feel the duplication is
> acceptable, but I feel that using a loop for the tests in 001_basic.pl
> is a bit tidier. Let me know your thoughts.
I will take a look.
>
> I have applied all your other patches but skipped the changes to
> pg_verifybackup.c from cf5955-fixes.patch.no-cfbot, as they seem
> unrelated or perhaps I have misunderstood them.
<brown-paper-bag> That's what I get for using a poorly written tool.
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Corey Huinker | 2026-03-04 21:57:48 | Re: Add expressions to pg_restore_extended_stats() |
| Previous Message | Sami Imseih | 2026-03-04 21:33:13 | Re: Cleaning up PREPARE query strings? |