Re: pgsql: Add contrib/pg_walinspect.

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, Noah Misch <noah(at)leadboat(dot)com>, Jeff Davis <jdavis(at)postgresql(dot)org>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgsql: Add contrib/pg_walinspect.
Date: 2022-04-28 00:11:40
Message-ID: CA+hUKG+VXatd-_ykFdnQoGrBb9NuNz8-YMGiWMMtNEPp9rU3Vw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Wed, Apr 27, 2022 at 10:22 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> Here's v2 patch (up on Thomas's v1 at [1]) using private_data to set
> the end of the WAL flag. Please have a look at it.

I don't have a strong view on whether it's better to use a NULL error
for this private communication between pg_walinspect.c and the
read_page callback it installs, or install a custom private_data to
signal this condition, or to give up on all this stuff completely and
just wait (see below for thoughts). I'd feel better about both
no-wait options if the read_page callback in question were actually in
the contrib module, and not in the core code. On the other hand, I'm
not too hung up about it because I'm really hoping to see the
get-rid-of-all-these-callbacks-and-make-client-code-do-the-waiting
scheme proposed by Horiguchi-san and Heikki developed further, to rip
much of this stuff out in a future release.

If you go with the private_data approach, a couple of small comments:

if (record == NULL)
{
+ ReadLocalXLogPageNoWaitPrivate *private_data;
+
+ /* return NULL, if end of WAL is reached */
+ private_data = (ReadLocalXLogPageNoWaitPrivate *)
+ xlogreader->private_data;
+
+ if (private_data->end_of_wal)
+ return NULL;
+
if (errormsg)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read WAL at %X/%X: %s",
LSN_FORMAT_ARGS(first_record), errormsg)));
- else
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not read WAL at %X/%X",
- LSN_FORMAT_ARGS(first_record))));
}

I think you should leave that second ereport() call in, in this
variant of the patch, no? I don't know if anything else raises errors
with no message, but if we're still going to treat them as silent
end-of-data conditions then you might as well go with the v1 patch.

Another option might be to abandon this whole no-wait concept and
revert 2258e76f's changes to xlogutils.c. pg_walinspect already does
preliminary checks that LSNs are in range, so you can't enter a value
that will wait indefinitely, and it's interruptible (it's a 1ms
sleep/check loop, not my favourite programming pattern but that's
pre-existing code). If you're unlucky enough to hit the case where
the LSN is judged to be valid but is in the middle of a record that
hasn't been totally flushed yet, it'll just be a bit slower to return
as we wait for the inevitable later flush(es) to happen. The rest of
your record will *surely* be flushed pretty soon (or the flushing
backend panics the whole system and time ends). I don't imagine this
is performance critical work, so maybe that'd be acceptable?

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Michael Paquier 2022-04-28 01:12:47 pgsql: Replace existing durable_rename_excl() calls with durable_rename
Previous Message Tom Lane 2022-04-27 18:08:45 Re: Unstable tests for recovery conflict handling

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-04-28 00:12:13 Re: Possible corruption by CreateRestartPoint at promotion
Previous Message Peter Smith 2022-04-27 23:49:56 Multi-Master Logical Replication