| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
| Cc: | Tomas Vondra <tomas(at)vondra(dot)me>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Amul Sul <sulamul(at)gmail(dot)com>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: pg_waldump: support decoding of WAL inside tarfile |
| Date: | 2026-04-02 01:22:44 |
| Message-ID: | 3118179.1775092964@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> On Thu, Apr 2, 2026 at 7:25 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Also, if we are admitting the possibility that what we are reading
>> was made by a platform-supplied tar and not our own code, I think
>> it verges on lunacy to behave as though unsupported typeflags are
>> regular files.
> Yeah, if this is the first time we parse files we didn't make then
> that makes total sense. I was a bit unsure of that question when I
> suggested we reject pax only after we've failed to find a file, in
> case there are scenarios that work today with harmless ignorable pax
> headers that don't change the file name.
Of course this is all new so far as pg_waldump is concerned.
I'm a bit unclear about whether pg_verifybackup's exposure
is large enough to warrant back-patching any of this.
Looking again at astreamer_tar.c, I suddenly realized that it doesn't
do any meaningful input validation. So if you feed it junk input,
you get garbage errors that aren't even predictable:
$ head -c 32768 /dev/urandom >junk.tar
$ pg_waldump --path junk.tar --start 1/0x10000000
pg_waldump: error: COPY stream ended before last file was finished
$ head -c 32768 /dev/urandom >junk.tar
$ pg_waldump --path junk.tar --start 1/0x10000000
pg_waldump: error: tar member has empty name
$ head -c 32768 /dev/urandom >junk.tar
$ pg_waldump --path junk.tar --start 1/0x10000000
pg_waldump: error: COPY stream ended before last file was finished
$ head -c 32768 /dev/urandom >junk.tar
$ pg_waldump --path junk.tar --start 1/0x10000000
pg_waldump: error: could not find WAL in archive "junk.tar"
tar itself is considerably saner:
$ tar tf junk.tar
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors
So I think we need something like the attached, in addition
to what I sent before. This just makes astreamer_tar.c use
the isValidTarHeader function that pg_dump already had.
(I decided to const-ify isValidTarHeader's argument while
moving it to a shared location, which in turn requires
const-ifying tarChecksum.)
regards, tom lane
| Attachment | Content-Type | Size |
|---|---|---|
| v1-validate-tar-headers.patch | text/x-diff | 5.1 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | shihao zhong | 2026-04-02 01:54:25 | Re: [Patch] New pg_stat_tablespace view |
| Previous Message | Andreas Karlsson | 2026-04-02 01:16:33 | Re: Check some unchecked fclose() results |