Re: pg_waldump: support decoding of WAL inside tarfile

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
Cc: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
Date: 2025-11-26 06:02:04
Message-ID: CAAJ_b94SEcBVJcsp0Y1-YvLqZWBHPQH4FhRzSJfaH_ah_eL_FQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 25, 2025 at 2:21 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>
> Hi Amul,
>
> I reviewed the patch and got some comments:
>

Thanks for the review. Replying inline below.

> 1 - 0001 - pg_waldump.h
> ```
> + * pg_waldump.h - decode and display WAL
> + *
> + * Copyright (c) 2013-2025, PostgreSQL Global Development Group
> ```
>
> This header file is brand new, so copyright year should be only 2025.
>

Fixed in the attached version.

> 2 - 0001 - pg_waldump.c
> ```
> -static int WalSegSz;
> +int WalSegSz = DEFAULT_XLOG_SEG_SIZE;
> ```
>
> 0001 claims a refactoring, but if you initialize WalSegSz with DEFAULT_XLOG_SEG_SIZE, then the behavior is changing, this change is no longer a pure refactor.
>
> I would suggest leave WalSegSz uninitiated (compiler will set 0 to it), then no behavior change, so that 0001 stays a self-contained pure refactor.
>

Agreed.

> The other nit thing is that, as “static” is removed, now “WalSegSz” is placed in middle of two static variables, which looks not good. If I were making the code change, I would have moved WalSegSz to after all static variables.
>

I placed it before the static declaration.

> 3 - 0002
> ```
> @@ -383,21 +406,11 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
> XLogRecPtr targetPtr, char *readBuff)
> {
> XLogDumpPrivate *private = state->private_data;
> - int count = XLOG_BLCKSZ;
> + int count = required_read_len(private, targetPagePtr, reqLen);
> WALReadError errinfo;
>
> - if (XLogRecPtrIsValid(private->endptr))
> - {
> - if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
> - count = XLOG_BLCKSZ;
> - else if (targetPagePtr + reqLen <= private->endptr)
> - count = private->endptr - targetPagePtr;
> - else
> - {
> - private->endptr_reached = true;
> - return -1;
> - }
> - }
> + if (private->endptr_reached)
> + return -1;
> ```
>
> This change introduces a logic hole. In old code, it sets private->endptr_reached = true; and return -1. In the code code, count and private->endptr_reached assignments are wrapped into required_read_len(). However, required_read_len() doesn’t check if private->endptr_reached has already been true, so that the logic hole is that, if private->endptr_reached is already true when calling required_read_len(), and required_read_len() returns a positive count, if (private->endptr_reached) will also be satisfied and return -1 from the function.
>
> So, to be safe, we should check “if (count < 0) return -1”.
>

I do not really understand the logical hole where the behaviour is the
same as the previous, but I like the idea of checking endptr_reached.
This is quite unlikely to be true, but it looks like good practice to
check that flag before setting it. Did it that way in the attached
version.

> 4 - 0002
> ```
> +/* Returns the size in bytes of the data to be read. */
> +static inline int
> +required_read_len(XLogDumpPrivate *private, XLogRecPtr targetPagePtr,
> + int reqLen)
> +{
> ```
>
> The function comment is too simple. It doesn’t cover the case where -1 is returned.
>

Okay.

> 5 - 0003
> ```
> +my @scenario = (
> + {
> + 'path' => $node->data_dir
> + });
>
> -(at)lines = test_pg_waldump('--limit' => 6);
> -is(@lines, 6, 'limit option observed');
> +for my $scenario (@scenario)
> +{
> ```
>
> "my @scenario” should be "my @scenarios”, so that for line become "for my $scenario (@scenarios)”, a little bit clearer.
>

Done.

> 6 - 0003
> ```
> + SKIP:
> + {
> ```
>
> Why SKIP label is defined here? A SKIP label usually follows a skip statement, for example: in bin/pg_ctl/t/001_start_stop.pl
> ```
> SKIP:
> {
> skip "unix-style permissions not supported on Windows", 2
> if ($windows_os);
>
> ok(-f $logFileName);
> ok(check_mode_recursive("$tempdir/data", 0700, 0600));
> }
> ```
>

Yeah, I knew that, but that is needed in the next patch where I wanted
to avoid a large diff when introducing SKIP and the associated
indentation. This patch is not expected to be committed independently,
and I have added a note in the commit message for the same.

> 7 - 0004 - Makefile
> ```
> $(WIN32RES) \
> compat.o \
> pg_waldump.o \
> + archive_waldump.o \
> rmgrdesc.o \
> xlogreader.o \
> xlogstats.o
> ```
>
> Obviously the list was in alphabetical order, so archive_waldump.o should be placed before compat.o.
>

Done.

> 8 - 0004
> ```
> +/*
> + * pg_waldump's XLogReaderRoutine->page_read callback to support dumping WAL
> + * files from tar archives.
> + */
> +static int
> +TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
> + XLogRecPtr targetPtr, char *readBuff)
> +{
> + XLogDumpPrivate *private = state->private_data;
> + int count = required_read_len(private, targetPagePtr, reqLen);
> ```
>
> Looking the page_read’s spec:
> ```
> /*
> * Data input callback
> *
> * This callback shall read at least reqLen valid bytes of the xlog page
> * starting at targetPagePtr, and store them in readBuf. The callback
> * shall return the number of bytes read (never more than XLOG_BLCKSZ), or
> * -1 on failure. The callback shall sleep, if necessary, to wait for the
> * requested bytes to become available. The callback will not be invoked
> * again for the same page unless more than the returned number of bytes
> * are needed.
> *
> * targetRecPtr is the position of the WAL record we're reading. Usually
> * it is equal to targetPagePtr + reqLen, but sometimes xlogreader needs
> * to read and verify the page or segment header, before it reads the
> * actual WAL record it's interested in. In that case, targetRecPtr can
> * be used to determine which timeline to read the page from.
> *
> * The callback shall set ->seg.ws_tli to the TLI of the file the page was
> * read from.
> */
> XLogPageReadCB page_read;
> ```
>
> It says that page_read must read reqLen bytes, otherwise it should wait for more bytes.
>
> However,
just calculate how many bytes can read and only read that long, which
breaks the protocol. Is it a problem?
>

The behaviour is the same as the routine used to read the bare WAL
file. I don't think there will be any problem for the pg_waldump.

> 9 - 0004
> ```
> +/*
> + * Create an astreamer that can read WAL from tar file.
> + */
> +static astreamer *
> +astreamer_waldump_new(XLogDumpPrivate *privateInfo)
> +{
> + astreamer_waldump *streamer;
> +
> + streamer = palloc0(sizeof(astreamer_waldump));
> + *((const astreamer_ops **) &streamer->base.bbs_ops) =
> + &astreamer_waldump_ops;
> +
> + streamer->privateInfo = privateInfo;
> +
> + return &streamer->base;
> +}
> ```
>
> This function allocates memory for streamer but only returns &streamer->base, so memory of streamer is leaked.
>

May I know why you think there would be a memory leak? I believe the
address of the structure is the same as the address of its first
member, base. I am returning base because the goal is to return a
generic astreamer type, which is the standard approach used in other
archive streamer code.

> Also, in the function comment, “from tar file” => “from a tar file”.
>
> 10 - 0004
> ```
> + * End-of-stream processing for a astreamer_waldump stream.
> ```
>
> Nit typo: a => an
>

Done.

> 11 - 0004
> ```
> + if (!IsValidWalSegSize(WalSegSz))
> + {
> + pg_log_error(ngettext("invalid WAL segment size in WAL file from archive \"%s\" (%d byte)",
> + "invalid WAL segment size in WAL file from archive \"%s\" (%d bytes)",
> + WalSegSz),
> + privateInfo->archive_name, WalSegSz);
> + pg_log_error_detail("The WAL segment size must be a power of two between 1 MB and 1 GB.");
> + exit(1);
> + }
> ```
>
> Why don’t pg_fatal()?
>

This is how we do when we need to emit error details as well.

> 12 - 0005
> ```
> + /* Create a temporary file if one does not already exist */
> + if (!entry->tmpseg_exists)
> + {
> + write_fp = prepare_tmp_write(entry->segno);
> + entry->tmpseg_exists = true;
> + }
> +
> + /* Flush data from the buffer to the file */
> + perform_tmp_write(entry->segno, &entry->buf, write_fp);
> + resetStringInfo(&entry->buf);
> +
> + /*
> + * The change in the current segment entry indicates that the reading
> + * of this file has ended.
> + */
> + if (entry != privateInfo->cur_wal && write_fp != NULL)
> + {
> + fclose(write_fp);
> + write_fp = NULL;
> + }
> ```
>
> When entry->tmpseg_exists is true, then write_fp will not be initialized, but there should be a check to make sure write_fp is not NULL before perform_tmp_write().
>

perform_tmp_write() has assert for the same.

> Also, if write_fp != NULL, should we anyway close the file without considering entry != privateInfo->cur_wal? Otherwise write_fp may be left open.
>

We read the WAL from the tar file in chunks, and those same chunks are
written to the temporary file within the loop. If we close the
temporary file now, we will have to open it again later for the next
chunk write. Could you elaborate on a scenario where you believe this
file might be left open unintentionally?

> 13 - 0005
> ```
> + * Use the directory specified by the TEMDIR environment variable. If it’s
> ```
>
> Typo: TEMDIR => TMPDIR
>

Done.

> 14 - 0005
> ```
> + * Set up a temporary directory to temporarily store WAL segments.
> ```
>
> temporary and temporarily are redundant.
>

I believe that is grammatically correct and clear.

> No comment for 0007.
>
> 15 - 0007
>
> I wonder why we need to manually po files? This is the first time I see a patch including po file changes.
>

Okay, I included that initially to ensure the PO file update wasn't
overlooked during commit. I have removed it to minimize the diff and
added the note in the patch commit message.

> 16 - 0008
> ```
> + {
> + pg_log_error("wal archive not found");
> + pg_log_error_hint("Specify the correct path using the option -w/--wal-path."
> + "Or you must use -n/--no-parse-wal when verifying a tar-format backup.");
> + exit(1);
> + }
> ```
>
> “wal” should be “WAL”.
>
> In the hint message, there should be a white space between the two sentences.
>

Done.

Thanks again for your review comments; they are quite helpful. Kindly
take a look at the attached version.

Regards,
Amul

Attachment Content-Type Size
v9-0001-Refactor-pg_waldump-Move-some-declarations-to-new.patch application/x-patch 2.4 KB
v9-0002-Refactor-pg_waldump-Separate-logic-used-to-calcul.patch application/x-patch 2.5 KB
v9-0003-Refactor-pg_waldump-Restructure-TAP-tests.patch application/x-patch 5.6 KB
v9-0004-pg_waldump-Add-support-for-archived-WAL-decoding.patch application/x-patch 37.4 KB
v9-0005-pg_waldump-Remove-the-restriction-on-the-order-of.patch application/x-patch 13.4 KB
v9-0006-pg_verifybackup-Delay-default-WAL-directory-prepa.patch application/x-patch 1.7 KB
v9-0007-pg_verifybackup-Rename-the-wal-directory-switch-t.patch application/x-patch 5.9 KB
v9-0008-pg_verifybackup-enabled-WAL-parsing-for-tar-forma.patch application/x-patch 9.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nishant Sharma 2025-11-26 06:05:04 Re: [PATCH] Add pg_get_tablespace_ddl() function to reconstruct CREATE TABLESPACE statement
Previous Message Corey Huinker 2025-11-26 05:50:14 Re: Extended Statistics set/restore/clear functions.