| From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Tomas Vondra <tomas(at)vondra(dot)me>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Amul Sul <sulamul(at)gmail(dot)com>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: pg_waldump: support decoding of WAL inside tarfile |
| Date: | 2026-04-01 02:05:53 |
| Message-ID: | CA+hUKG+Pqz5=YQG_=8ho0YsTfn2HWOsJQWqS4j0q8QQWweJP9w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Mar 30, 2026 at 11:23 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> > Anyway, given the defaults, GNU tar + ZFS/BTRFS users must be pretty
> > unlikely to hit this in the wild, and the symptom is a confusing error
> > in a maintenance tool, not corruption, so I don't think this is a big
> > deal. I might still try teaching the astreamer code to understand PAX
> > 1.0 when it sees it in the next cycle though, for the benefit of
> > FreeBSD users.
>
> I agree that this isn't too critical if the effects are confined to
> pg_waldump. I believe that pg_basebackup and pg_verifybackup also use
> astreamer_tar.c, but it's not clear to me if they'd ever be asked to
> parse files made by tar(1) and not by our own sparseness-ignorant
> tar-writing code. If they can be, that'd be a higher-priority reason
> to fill in this gap.
I pushed the workaround for the test.
Yeah I can't see any reason why pg_verifybackup --wal-path=foo.tar
won't suffer the same problem in the wild. Again, it's not the end of
the world because it'll just fail and you'll probably eventually
figure out why. So perhaps we should just improve our detection of
archives that we can't handle? Straw man algorithm:
If you can't find $NAME in the archive, then check if PaxHeaders/$NAME
exists, and if so, fail with 'unsupported TAR format for WAL file "%s"
in archive "%s"' instead. That'd probably work well enough in
practice, because astreamer_tar.c treats PAX extended header
pseudo-files as regular files (they're not, they have type 'x'), and
both GNU and BSD tar happen to use that.
POSIX doesn't require that naming, so it would in theory be more
correct to teach astreamer_tar.c to recognise PAX extended headers and
fish out enough information and link it to the following archive
member, but a simple test to improve error messaging seems like the
right level of effort here.
Here's a test patch that shows the problem on any system with GNU tar
or BSD tar and a file system that supports sparse files. The test
succeeds because it looks for "error: could not find WAL" but the idea
would be to change it to look for a new error message like that. My
motivation was to make this reproducible on any system, in case that's
helpful for Amul and Andrew if they're interested in trying to improve
this edge case in time for the release. Otherwise I'll come back to
it, but probably not in time...
| Attachment | Content-Type | Size |
|---|---|---|
| 0001-Add-a-pg_waldump-test-with-GNU-tar-PAX-format.patch | text/x-patch | 4.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2026-04-01 02:15:09 | bufmgr: pass through I/O stats context in FlushUnlockedBuffer() |
| Previous Message | Alexander Lakhin | 2026-04-01 02:00:00 | Re: More speedups for tuple deformation |