Re: pg_waldump: support decoding of WAL inside tarfile

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
Date: 2026-03-22 23:13:11
Message-ID: 3341199.1774221191@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Everything's pushed, although it looks like it'll be a few hours
before we see results from batta or hachi.

In the meantime, I have one more change that I'd like to propose in
archive_waldump.c: I think we should drop read_archive_file's "count"
parameter and just always try to fill archive_read_buf completely,
as attached. I don't believe that using a variable count is buying
anything. I get that the idea was to not load more data than we have
to before init_archive_reader can identify the segment size and start
filtering data we don't need. But I doubt that that helps a whole
lot: I think most of the cost in this pipeline is going into
decompression and tar format parsing, not into copying data into the
hashtable. Also, I noticed while I was investigating the bugs we
just fixed that the tar file very likely contains a ton of other
data before the first WAL file. In the regression test scenarios
it seems to reliably have all of the cluster's actual table data
ahead of the WAL directory. By using a small read count, we are
slowing down scanning of all that data. Maybe not by much, I've
not tried to measure it. But I think the case for using a variable
count is very weak.

Even if you don't like that change, I think we should apply the
parts of the attached that change archive_read_buf_size into an
always-present, always-filled field of XLogDumpPrivate. Having
its presence be dependent on USE_ASSERT_CHECKING is just asking
for trouble, ie different .c files having a different opinion
of whether it's there.

regards, tom lane

Attachment Content-Type Size
v1-remove-read_archive_file-count-parameter.patch text/x-diff 4.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Jackson 2026-03-22 23:38:41 Re: Add ldapservice connection parameter
Previous Message Tom Lane 2026-03-22 21:39:26 Re: pg_waldump: support decoding of WAL inside tarfile