Re: pg_waldump: support decoding of WAL inside tarfile

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Amul Sul <sulamul(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
Date: 2026-03-21 19:31:10
Message-ID: 2431968.1774121470@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I don't like the v3 patches too much: in particular, they do nothing
for the failure-to-finalize bug I identified yesterday. v3-0003
is on the right track but seems overcomplicated. Here is my own
set of proposed patches, which I think will fix what we are seeing
in the buildfarm:

v4-0001 is Andrew's fix for incorrect decompressor finalization.

v4-0002 fixes read_archive_file() to do the missing finalize step.

v4-0003 fixes init_archive_reader() to not depend on cur_file.
This is closely allied to v3-0003 but simpler. I also added
some commentary to pg_waldump.h about what it's safe to do with
cur_file.

get_archive_wal_entry() violates that advice and is pretty much
utterly broken IMO, because it still believes that it can use cur_file
in an incorrect way. However, the impact of that is that it may fail
to flush some hashtable entries out to temp files (in case a single
read_archive_file() step reads more than one WAL file, which is
entirely possible with compression). That is a performance issue but
it's not causing our buildfarm problems, so I left it untouched here.
But I don't think any of the patches proposed so far fix it properly.
What it should do should look more like the revised version of
init_archive_reader's loop: call read_archive_file(), then scan the
hash table for WAL entries we need to flush to files, then finally
return the desired WAL entry if it's present, else loop around.

regards, tom lane

Attachment Content-Type Size
v4-0001-Fix-finalization-of-decompressor-astreamers.patch text/x-diff 2.8 KB
v4-0002-Fix-failure-to-finalize-the-decompression-pipelin.patch text/x-diff 6.0 KB
v4-0003-Fix-init_archive_reader-to-not-depend-on-cur_file.patch text/x-diff 3.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2026-03-21 19:54:40 Re: index prefetching
Previous Message Greg Burd 2026-03-21 18:49:22 Re: Add RISC-V Zbb popcount optimization