Re: pg_waldump: support decoding of WAL inside tarfile

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Amul Sul <sulamul(at)gmail(dot)com>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
Date: 2026-03-29 22:11:50
Message-ID: CA+hUKGJyvdyWMC-RW1njqevD-q_gTbFq+DyDiFpUJVaG+DY20w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 30, 2026 at 2:33 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> On 3/29/26 00:12, Tom Lane wrote:
> > I've reproduced Thomas' failure on a local FreeBSD 15.0 image
> > using zfs, and confirmed that this cowboy hack fixes it:
> >
>
> Interesting. Then I guess it has to be due to some difference in ufs vs.
> zfs, when handling sparse files. It might be useful to add a bit more
> variation here, and switch some of the animals to non-default
> filesystems (not just the FreeBSD ones, which we seem to have only two
> that run reasonably often). I'd bet most of the linux systems run on
> ext4/xfs, few on btrfs/zfs.

UFS does have sparse files (its ancestor invented them some time
around (time_t) 0), it just doesn't make them unless you tell it to.
PostgreSQL only does that if you set wal_init_zero=false.

ZFS is different because it creates holes automagically when you write
zeroes, at least if compression is enabled so it has to scan all your
bytes anyway.

I was curious to know if BTRFS does that too, or hides
zero-compression at some lower invisible level:

$ echo "hello" > 1MB-sparse.dat
$ truncate -s 512KB 1MB-sparse.dat
$ echo "world" >> 1MB-sparse.dat
$ truncate -s 1MB 1MB-sparse.dat
$ ls -l 1MB-sparse.dat
-rw-rw-r-- 1 tmunro tmunro 1000000 Mar 30 10:11 1MB-sparse.dat
$ du -hs 1MB-sparse.dat
8.0K 1MB-sparse.dat
$ strace tar -S -cf foo.tar 1MB-sparse.dat 2>&1 | grep seek
lseek(4, 0, SEEK_DATA) = 0
lseek(4, 0, SEEK_HOLE) = 4096
lseek(4, 4096, SEEK_DATA) = 512000
lseek(4, 512000, SEEK_HOLE) = 516096
lseek(4, 516096, SEEK_DATA) = -1 ENXIO (No such device or address)

... so that's a yes, lseek sees holes that we didn't ask it to make,
just like on ZFS, but the rest of this trace of GNU tar -S -cf is
interesting:

lseek(5, 0, SEEK_SET) = 0
lseek(5, 0, SEEK_SET) = 0
lseek(4, 0, SEEK_SET) = 0
lseek(4, 512000, SEEK_SET) = 512000
lseek(4, 1000000, SEEK_SET) = 1000000

It didn't write out PAX format! Instead it replicated the holes into
the tar file itself with SEEK_SET.

$ strings foo.tar | grep Sparse

You have to add --format=posix to enable the GNU behaviour that BSD
tar is emulating by default:

$ tar --format=posix -S -cf foo.tar 1MB-sparse.dat
$ strings foo.tar | grep Sparse
./GNUSparseFile.4190/1MB-sparse.dat

I expected GNU tar to be forced to do that if writing to non-seekable
output, eg "tar -S -c 1MB-sparse.dat | cat > foo.tar", but somehow it
manages to write out only ~10KB of plain ustar format that it is able
to restore to the full 1MB apparent size using some other trick, but
... ENOTIME, I dunno how it's doing that. Might be interesting to see
if pg_waldump can read it though, 'cause the bytes aren't all there.

BTW I confirmed that Apple tar does have -S by default too, it's just
that APFS doesn't make holes magically, so this test would presumably
have broken on a Mac if wal_init_zero had been forced to zero (not
tested).

Anyway, given the defaults, GNU tar + ZFS/BTRFS users must be pretty
unlikely to hit this in the wild, and the symptom is a confusing error
in a maintenance tool, not corruption, so I don't think this is a big
deal. I might still try teaching the astreamer code to understand PAX
1.0 when it sees it in the next cycle though, for the benefit of
FreeBSD users. A quick and dirty version could probably just unmangle
the name and skip the first block of data, since any valid WAL file
will not begin with a hole and valid WAL data will end at the first
hole and fail our verification, but of course a real implementation
should read the map properly[1]...

[1] https://www.gnu.org/software/tar/manual/html_node/PAX-1.html

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2026-03-29 22:20:24 Re: pg_waldump: support decoding of WAL inside tarfile
Previous Message Alvaro Herrera 2026-03-29 22:02:37 Re: Adding REPACK [concurrently]