Re: pg_waldump: support decoding of WAL inside tarfile

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amul Sul <sulamul(at)gmail(dot)com>
Cc: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
Date: 2026-01-19 13:30:48
Message-ID: CA+TgmoYMtcZBaqy9r59eDapaDy3WOdepkFFURu9MV-x-kxEbKg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 19, 2026 at 6:14 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> We previously tried having the astreamer function perform all
> necessary checks (e.g., TLI and segment number), but this became
> problematic in init_archive_reader when we needed to read a single WAL
> page just to determine the WAL segment size without any validation. To
> bypass those checks, I initially used a READ_ANY_WAL() macro, but it
> felt like a hack.
>
> In the current design, the archive streamer is responsible only for
> reading and copying data to the buffer. To optimize this and avoid
> unnecessary memcpy and data buffering for WAL files that aren't
> needed, we set privateInfo->cur_wal to NULL, signaling the streamer
> code to skip those operations.

Hmm. In that case, the code definitely deserves a comment, as that was
not at all clear to me.

> But, I am thinking that instead of setting privateInfo->cur_wal to
> NULL, we could simply discard the buffer data at that place and let
> memcpy in astreamer_content copy if it would be that much of an issue.

I don't think doing unnecessary copying is the right way forward. The
copying itself could be expensive, but I am a little concerned about
the memory utilization of this code. Suppose the user has increased
the WAL segment size to 1GB or even higher. It seems like we could
buffer a whole segment or maybe even more in some scenarios. If we
avoid copying data we don't need, then we also avoid buffering it in
memory.

In terms of the separation of concerns, we could view setting
privateInfo->cur_wal = NULL as a form of signaling, a way for this
code to tell the astreamer that it doesn't need the data buffering.
However, I think it might be better to make the signaling more
explicit. Instead of having the caller directly set the buffer to
NULL, or directly trim data out of the buffer, maybe it should set
some value in privateInfo that tells the astreamer what to do. For
instance, suppose it sets the oldest LSN that it might still care
about in privateInfo, and then the astreamer is free to do skip
copying of any data prior to that LSN, and discard any that it already
has. Especially if properly commented, I think this might be more
clear than what you have now. I am not 100% sure that's the right
idea, though. I just think right now it's a bit murky who does what
and why the division of responsibilities is what it is.

For example, read_archive_wal_page() currently has some responsibility
for discarding data, but is that because it is the right place to
discard data, or is that just because that's where we have an LSN
available that tells us what we can discard? I don't know, but an
explicit signaling mechanism like what I describe in the previous
paragraph can make it possible for those two things to happen in
different places.

> In the earlier version, we had only one buffer where we copied
> streamed data, but a problem (previously discussed [1]) arose when the
> archive streamer reads data from two WAL files -- i.e., one is ending
> and the next is starting. We have to keep track of where the previous
> file ends and the next one starts, along with the WAL segment
> information that each belongs to.

Hmm, interesting. My first thought was that if we were starting a new
WAL file, isn't it time to flush the data from the old one to a spill
file? But maybe it's not, if the WAL reader can reread records, and a
record can span file boundaries. But probably we need some comment so
that the next reader is not confused, and we definitely need to make
sure that the structure members in ArchivedWALEntry are well-chosen
and well-documented.

> > + if (strstr(member->pathname, "PaxHeaders."))
> > + return false;
> >
> > There is no way that a filename containing the string "PaxHeaders."
> > could ever pass the IsXLogFileName test just above. We shouldn't need
> > this.
> >
>
> This checks the directory name (e.g.,
> x_dir/y_dir/PaxHeaders.NNNN/wal_file_name). The name of that metadata
> file is exactly the same as the WAL file name, which is why
> IsXLogFileName() doesn't help here.

I think this code should only be considering files in the toplevel
directory, and skipping over any directories it finds. I absolutely
promise I am not going to commit anything that is specifically looking
for PaxHeaders. Nothing we've ever done with tar files up to now has
required that, and I don't think this should, either.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2026-01-19 13:37:43 Re: SQL:2011 Application Time Update & Delete
Previous Message Álvaro Herrera 2026-01-19 13:09:48 Re: [PATCH] psql: add \dcs to list all constraints