Re: pg_waldump: support decoding of WAL inside tarfile

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amul Sul <sulamul(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
Date: 2025-09-12 18:28:15
Message-ID: CA+TgmoZjhWDG_AR1i+L1yss-wbuWvxrdRwSdVUUUnVPrJV2CnQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments on v3-0004:

In general, I think this looks pretty nice, but I think it needs more
cleanup and polishing.

There doesn't seem to be any reason for
astreamer_waldump_content_new() to take an astreamer *next argument.
If you look at astreamer.h, you'll see that some astreamer_BLAH_new()
functions take such an argument, and others don't. The ones that do
forward their input to another astreamer; the ones that don't, like
astreamer_plain_writer_new(), send it somewhere else. AFAICT, this
astreamer is never going to send its output to another astreamer, so
there's no reason for this argument.

I'm also a little confused by the choice of the name
astreamer_waldump_content_new(). I would have thought this would be
something like astreamer_waldump_new() or astreamer_xlogreader_new().
The word "content" doesn't seem to me to be adding much here, and it
invites confusion with the "content" callback.

I think you can merge setup_astreamer() into
init_tar_archive_reader(). The only other caller is
verify_tar_archive(), but that does exactly the same additional steps
as init_tar_archive_reader(), as far as I can see.

The return statement for astreamer_wal_read is really odd:

+ return (count - nbytes) ? (count - nbytes) : -1;

Since 0 is false in C, this is equivalent to: count != nbytes ? count
- nbytes : -1, but it's a strange way to write it. What makes it even
stranger is that it seems as though the intention here is to count the
number of bytes read, but you do that by taking the number of bytes
requested (count) and subtracting the number of bytes we didn't manage
to read (nbytes); and then you just up and return -1 instead of 0
whenever the answer would have been zero. This is all lacking in
comments and seems a bit more confusing than it needs to be. So my
suggestions are:

1. Consider redefining nbytes to be the number of bytes that you have
read instead of the number of bytes you haven't read. So the loop in
this function would be while (nbytes < count) instead of while (nbytes
> 0).

2. If you need to map 0 to -1, consider having the caller do this
instead of putting that inside this function.

3. Add a comment saying what the return value is supposed to be".

If you do both 1 and 2, then the return statement can just say "return
nbytes;" and the comment can say "Returns the number of bytes
successfully read."

I would suggest changing the name of the variable from "readBuff" to
"readBuf". There are no existing uses of readBuff in the code base.

I think this comment also needs improvement:

+ /*
+ * Ignore existing data if the required target page
has not yet been
+ * read.
+ */
+ if (recptr >= endPtr)
+ {
+ len = 0;
+
+ /* Reset the buffer */
+ resetStringInfo(astreamer_buf);
+ }

This comment is problematic for a few reasons. First, we're not
ignoring the existing data: we're throwing it out. Second, the comment
doesn't say why we're doing what we're doing, only that we're doing
it. Here's my guess at the actual explanation -- please correct me if
I'm wrong: "pg_waldump never reads the same WAL bytes more than once,
so if we're now being asked for data beyond the end of what we've
already read, that means none of the data we currently have in the
buffer will ever be consulted again. So, we can discard the existing
buffer contents and start over." By the way, if this explanation is
correct, it might be nice to add an assertion someplace that verifies
it, like asserting that we're always reading from an LSN greater than
or equal to (or exactly equal to?) the LSN immediately following the
last data we read.

In general, I wonder whether there's a way to make the separation of
concerns between astreamer_wal_read() and TarWALDumpReadPage()
cleaner. Right now, the latter is basically a stub, but I'm not sure
that is the best thing here. I already mentioned one example of how to
do this: make the responsibility for 0 => -1 translation the job of
TarWALDumpReadPage() rather than astreamer_wal_read(). But I think
there might be a little more we can do. In particular, I wonder
whether we could say that astreamer_wal_read() is only responsible for
filling the buffer, and the caller, TarWALDumpReadPage() in this case,
needs to empty it. That seems like it might produce a cleaner
separation of duties.

Another thing that isn't so nice right now is that
verify_tar_archive() has to open and close the archive only for
init_tar_archive_reader() to be called to reopen it again just moments
later. It would be nicer to open the file just once and then keep it
open. Here again, I wonder if the separation of duties could be a bit
cleaner.

Is there a real need to pass XLogDumpPrivate to astreamer_wal_read or
astreamer_archive_read? The only things that they need are archive_fd,
archive_name, archive_streamer, archive_streamer_buf, and
archive_streamer_read_ptr. In other words, they really don't care
about any of the *existing* things that are in XLogDumpPrivate. This
makes me wonder whether we should actually try to make this new
astreamer completely independent of xlogreader. In other words,
instead of calling it astreamer_waldump() or astreamer_xlogreader() as
I proposed above, maybe it could be a completely generic astreamer,
say astreamer_stringinfo_new(StringInfo *buf) that just appends to the
buffer. That would require also moving the stuff out of
astreamer_wal_read() that knows about XLogRecPtr, but why does that
function need to know about XLogRecPtr? Couldn't the caller figure out
that part and just tell this function how many bytes are needed?

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2025-09-12 18:47:48 Re: Eager aggregation, take 3
Previous Message Nico Williams 2025-09-12 18:21:06 Re: PostgreSQL 18 GA press release draft