| From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
|---|---|
| To: | Tomas Vondra <tomas(at)vondra(dot)me> |
| Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Amul Sul <sulamul(at)gmail(dot)com>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: pg_waldump: support decoding of WAL inside tarfile |
| Date: | 2026-03-29 22:11:50 |
| Message-ID: | CA+hUKGJyvdyWMC-RW1njqevD-q_gTbFq+DyDiFpUJVaG+DY20w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Mar 30, 2026 at 2:33 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> On 3/29/26 00:12, Tom Lane wrote:
> > I've reproduced Thomas' failure on a local FreeBSD 15.0 image
> > using zfs, and confirmed that this cowboy hack fixes it:
> >
>
> Interesting. Then I guess it has to be due to some difference in ufs vs.
> zfs, when handling sparse files. It might be useful to add a bit more
> variation here, and switch some of the animals to non-default
> filesystems (not just the FreeBSD ones, which we seem to have only two
> that run reasonably often). I'd bet most of the linux systems run on
> ext4/xfs, few on btrfs/zfs.
UFS does have sparse files (its ancestor invented them some time
around (time_t) 0), it just doesn't make them unless you tell it to.
PostgreSQL only does that if you set wal_init_zero=false.
ZFS is different because it creates holes automagically when you write
zeroes, at least if compression is enabled so it has to scan all your
bytes anyway.
I was curious to know if BTRFS does that too, or hides
zero-compression at some lower invisible level:
$ echo "hello" > 1MB-sparse.dat
$ truncate -s 512KB 1MB-sparse.dat
$ echo "world" >> 1MB-sparse.dat
$ truncate -s 1MB 1MB-sparse.dat
$ ls -l 1MB-sparse.dat
-rw-rw-r-- 1 tmunro tmunro 1000000 Mar 30 10:11 1MB-sparse.dat
$ du -hs 1MB-sparse.dat
8.0K 1MB-sparse.dat
$ strace tar -S -cf foo.tar 1MB-sparse.dat 2>&1 | grep seek
lseek(4, 0, SEEK_DATA) = 0
lseek(4, 0, SEEK_HOLE) = 4096
lseek(4, 4096, SEEK_DATA) = 512000
lseek(4, 512000, SEEK_HOLE) = 516096
lseek(4, 516096, SEEK_DATA) = -1 ENXIO (No such device or address)
... so that's a yes, lseek sees holes that we didn't ask it to make,
just like on ZFS, but the rest of this trace of GNU tar -S -cf is
interesting:
lseek(5, 0, SEEK_SET) = 0
lseek(5, 0, SEEK_SET) = 0
lseek(4, 0, SEEK_SET) = 0
lseek(4, 512000, SEEK_SET) = 512000
lseek(4, 1000000, SEEK_SET) = 1000000
It didn't write out PAX format! Instead it replicated the holes into
the tar file itself with SEEK_SET.
$ strings foo.tar | grep Sparse
You have to add --format=posix to enable the GNU behaviour that BSD
tar is emulating by default:
$ tar --format=posix -S -cf foo.tar 1MB-sparse.dat
$ strings foo.tar | grep Sparse
./GNUSparseFile.4190/1MB-sparse.dat
I expected GNU tar to be forced to do that if writing to non-seekable
output, eg "tar -S -c 1MB-sparse.dat | cat > foo.tar", but somehow it
manages to write out only ~10KB of plain ustar format that it is able
to restore to the full 1MB apparent size using some other trick, but
... ENOTIME, I dunno how it's doing that. Might be interesting to see
if pg_waldump can read it though, 'cause the bytes aren't all there.
BTW I confirmed that Apple tar does have -S by default too, it's just
that APFS doesn't make holes magically, so this test would presumably
have broken on a Mac if wal_init_zero had been forced to zero (not
tested).
Anyway, given the defaults, GNU tar + ZFS/BTRFS users must be pretty
unlikely to hit this in the wild, and the symptom is a confusing error
in a maintenance tool, not corruption, so I don't think this is a big
deal. I might still try teaching the astreamer code to understand PAX
1.0 when it sees it in the next cycle though, for the benefit of
FreeBSD users. A quick and dirty version could probably just unmangle
the name and skip the first block of data, since any valid WAL file
will not begin with a hole and valid WAL data will end at the first
hole and fail our verification, but of course a real implementation
should read the map properly[1]...
[1] https://www.gnu.org/software/tar/manual/html_node/PAX-1.html
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Thomas Munro | 2026-03-29 22:20:24 | Re: pg_waldump: support decoding of WAL inside tarfile |
| Previous Message | Alvaro Herrera | 2026-03-29 22:02:37 | Re: Adding REPACK [concurrently] |