Re: pg_waldump: support decoding of WAL inside tarfile

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Amul Sul <sulamul(at)gmail(dot)com>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
Date: 2026-04-02 01:22:44
Message-ID: 3118179.1775092964@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> On Thu, Apr 2, 2026 at 7:25 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Also, if we are admitting the possibility that what we are reading
>> was made by a platform-supplied tar and not our own code, I think
>> it verges on lunacy to behave as though unsupported typeflags are
>> regular files.

> Yeah, if this is the first time we parse files we didn't make then
> that makes total sense. I was a bit unsure of that question when I
> suggested we reject pax only after we've failed to find a file, in
> case there are scenarios that work today with harmless ignorable pax
> headers that don't change the file name.

Of course this is all new so far as pg_waldump is concerned.
I'm a bit unclear about whether pg_verifybackup's exposure
is large enough to warrant back-patching any of this.

Looking again at astreamer_tar.c, I suddenly realized that it doesn't
do any meaningful input validation. So if you feed it junk input,
you get garbage errors that aren't even predictable:

$ head -c 32768 /dev/urandom >junk.tar
$ pg_waldump --path junk.tar --start 1/0x10000000
pg_waldump: error: COPY stream ended before last file was finished
$ head -c 32768 /dev/urandom >junk.tar
$ pg_waldump --path junk.tar --start 1/0x10000000
pg_waldump: error: tar member has empty name
$ head -c 32768 /dev/urandom >junk.tar
$ pg_waldump --path junk.tar --start 1/0x10000000
pg_waldump: error: COPY stream ended before last file was finished
$ head -c 32768 /dev/urandom >junk.tar
$ pg_waldump --path junk.tar --start 1/0x10000000
pg_waldump: error: could not find WAL in archive "junk.tar"

tar itself is considerably saner:

$ tar tf junk.tar
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors

So I think we need something like the attached, in addition
to what I sent before. This just makes astreamer_tar.c use
the isValidTarHeader function that pg_dump already had.
(I decided to const-ify isValidTarHeader's argument while
moving it to a shared location, which in turn requires
const-ifying tarChecksum.)

regards, tom lane

Attachment Content-Type Size
v1-validate-tar-headers.patch text/x-diff 5.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shihao zhong 2026-04-02 01:54:25 Re: [Patch] New pg_stat_tablespace view
Previous Message Andreas Karlsson 2026-04-02 01:16:33 Re: Check some unchecked fclose() results