Re: pg_waldump: support decoding of WAL inside tarfile

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
Date: 2026-03-21 17:26:38
Message-ID: CAAJ_b96d0imBJ9Qm93oe40bEZ_4s-u2stM-JB0LYQC5GcjVW-w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 21, 2026 at 9:05 PM Amul Sul <sulamul(at)gmail(dot)com> wrote:
>
> On Sat, Mar 21, 2026 at 5:51 PM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> >
> >
> > On 2026-03-21 Sa 2:34 AM, Tom Lane wrote:
> >
> > Michael Paquier <michael(at)paquier(dot)xyz> writes:
> >
> > On Fri, Mar 20, 2026 at 11:49:02PM -0400, Tom Lane wrote:
> >
> > Buildfarm members batta and hachi don't like this very much.
> >
> > I did not look at what's happening on the host, but it seems like a
> > safe bet to assume that we are not seeing many failures in the
> > buildfarm because we don't have many animals that have the idea to add
> > --with-zstd to their build configuration, like these two ones.
> >
> > That may be part of the story, but only part. I spent a good deal of
> > time trying to reproduce batta & hachi's configurations locally, on
> > several different platforms, but still couldn't duplicate what they
> > are showing.
> >
> >
> >
> >
> >
> > Yeah, I haven't been able to reproduce it either. But while investigating I found a couple of issues. We neglected to add one of the tests to meson.build, and we neglected to close some files, causing errors on windows.
> >
>
> While the proposed fix of closing the file pointer before returning is
> correct, we also need to ensure the file is reopened in the next call
> to spill any remaining buffered data. I’ve made a small update to
> Andrew's 0001 patch to handle this. Also, changes to meson.build don't
> seem to be needed as we haven't committed that file yet (unless I am
> missing something).
>
> I’ve also reattached the other patches so they don't get lost: v2-0002
> is Andrew's patch for the archive streamer, and v2-0003 is the patch I
> posted previously [1].
>
>

On further thought, I don't think v2-0001 is the right patch. Consider
the case where we write a temporary file partially: if the next
segment required for decoding is that same segment,
TarWALDumpReadPage() will find the physical file present and continue
decoding, potentially triggering an error later due to the shorter
file.

I have attached the v3-0001 patch, which ensures that once we start
writing a temporary file, it should be finished before performing the
lookup. This ensures we don't leave a partial file on disk.

Updated patches are attached; 0002 and 0003 remain the same as before.

Regards,
Amul

Attachment Content-Type Size
v3-0001-archive_waldump-skip-hash-lookup-and-tighten-writ.patch application/x-patch 1.8 KB
v3-0002-Fix-astreamer-decompressor-finalize-to-send-corre.patch application/x-patch 2.5 KB
v3-0003-pg_waldump-Handle-archive-exhaustion-in-init_arch.patch application/x-patch 3.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2026-03-21 17:46:36 Re: Fix slotsync worker busy loop causing repeated log messages
Previous Message KAZAR Ayoub 2026-03-21 16:58:36 Add pg_stat_vfdcache view for VFD cache statistics