Re: pg_waldump: support decoding of WAL inside tarfile

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
Date: 2025-09-25 08:18:13
Message-ID: CAAJ_b978BOyTxoH-PmMrBzxcJ1FiBXVq78Borzd1nYg+ghguZQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 12, 2025 at 4:25 PM Amul Sul <sulamul(at)gmail(dot)com> wrote:
>
> On Mon, Sep 8, 2025 at 7:07 PM Jakub Wartak
> <jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
> >
> > On Tue, Aug 26, 2025 at 1:53 PM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> > >
> > [..patch]
> >
> > Hi Amul!
> >
>
> Thanks for your review. I'm replying to a few of your comments now,
> but for the rest, I need to think about them. I'm kind of in agreement
> with some of them for the fix, but I won't be able to spend time on
> that next week due to official travel. I'll try to get back as soon as
> possible after that.
>

Reverting on rest of review comments:

> 0001: LGTM, maybe I would just slightly enhance the commit message
> ("This is in preparation for adding a second source file to this
> directory.") -- maye bit a bit more verbose or use a message from
> 0002?

Done.

> b. Why would it like to open "blah" dir if I wanted that "blah"
> segment from the archive? Shouldn't it tell that it was looking in the
> archive and couldn find it inside?
> $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar blah
> pg_waldump: error: could not open file "blah": Not a directory

Now, an error will be thrown if any additional command-line
arguments are provided when an archive is specified, similar to how
existing extra arguments are handled.

> i. If I give wrong --timeline=999 to pg_waldump it fails with
> misleading error message: could not read WAL data from "pg_wal.tar"
> archive: read -1 of 8192

Now., added a much better error message for that case.

> a. I'm wondering if we shouldn't log (to stderr?) some kind of
> notification message (just once) that non-sequential WAL files were
> discovered and that pg_waldump is starting to write to $somewhere as
> it may be causing bigger I/O than anticipated when running the
> command. This can easily help when troubleshooting why it is not fast,
> and also having set TMPDIR to usually /tmp can be slow or too small.

Now, emitting info messages, but I'm not sure whether we should have
info or debug.

> b. IMHO member_prepare_tmp_write() / get_tmp_wal_file_path() with
> TMPDIR can be prone to symlink attack. Consider setting TMPDIR=/tmp .
> We are writing to e.g. /tmp/<WALsegment>.waldump.tmp in 0004 , but
> that path is completely guessable. If an attacker prepares some
> symlinks and links those to some other places, I think the code will
> happily open and overwrite the contents of the rogue symlink. I think
> using mkstemp(3)/tmpfile(3) would be a safer choice if TMPDIR needs to
> be in play. Consider that pg_waldump can be run as root (there's no
> mechanism preventing it from being used that way).

I am not sure what the worst-case scenario would be or what a good
alternative is.

> c. IMHO that unlink() might be not guaranteed to always remove
> files, as in case of any trouble and exit() , those files might be
> left over. I think we need some atexit() handlers. This can be
> triggered with combo of options of nonsequential files in tar + wrong
> LSN given:

Done.

> 0007:
> a. Commit message says `Future patches to pg_waldump will enable
> it to decode WAL directly` , but those pg_waldump are earlier patches,
> right?

Right, fixed.

> b. pg_verifybackup should print some info with --progress that it
> is spawning pg_waldump (pg_verifybackup --progress mode does not
> display anything related to verifing WALs, but it could)

If we decide to do that, it could be a separate project, IMHO.

> c. I'm wondering, but pg_waldump seems to be not complaining if
> --end=LSN is made into such a future that it doesn't exist.

The behavior will be kept as if a directory was provided with a start
and end LSN.

Thanks again for the review. I'll post the new patches in my next reply.

Regards,
Amul

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shubham Khanna 2025-09-25 08:18:40 Re: Add support for specifying tables in pg_createsubscriber.
Previous Message Daniel Gustafsson 2025-09-25 08:05:55 Re: Remove obsolate comments from 047_checkpoint_physical_slot