Re: pgsql: Add contrib/pg_walinspect.

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, Jeff Davis <jdavis(at)postgresql(dot)org>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgsql: Add contrib/pg_walinspect.
Date: 2022-04-27 00:06:30
Message-ID: CA+hUKGLtswFk9ZO3WMOqnDkGs6dK5kCdQK9gxJm0N8gip5cpiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Tue, Apr 26, 2022 at 5:36 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Tue, Apr 26, 2022 at 01:25:14AM -0400, Tom Lane wrote:
> > I've been wondering if the issue could be traced to topminnow's unusual
> > hardware properties, specifically that it has MAXALIGN 8 even though
> > it's only a 32-bit machine per sizeof(void *). I think the only
> > other active buildfarm animal like that is my gaur ... but I've
> > failed to reproduce it on gaur. Best guess at the moment is that
> > it's a timing issue that topminnow manages to reproduce often.
>
> I have managed to miss your message. Let's continue the discussion
> there, then.

I think it's a bug in pg_walinspect, so I'll move the discussion back
here. Here's one rather simple way to fix it, that has survived
running the test a thousand times (using a recipe that failed for me
quite soon, after 20-100 attempts or so; I never figured out how to
get the 50% failure rate reported by Tom). Explanation in commit
message. You can see that the comments near the first hunk already
contemplated this possibility, but just didn't try to handle it.

Another idea that I slept on, but rejected, is that the new WOULDBLOCK
return value introduced to support WAL prefetching could be used here
(it's a way of reporting a lack of data, different from errors).
Unfortunately it's not exposed to the XLogReadRecord() interface, as I
only intended it for use by XLogReadAhead(). I don't really think
it's a good idea to redesign that API at this juncture.

Maybe there is some other way I haven't considered -- is there a way
to get the LSN past the latest whole flushed record from shmem?

Attachment Content-Type Size
0001-Fix-pg_walinspect-race-against-flush-LSN.patch text/x-patch 4.2 KB

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2022-04-27 00:25:29 Re: pgsql: Add contrib/pg_walinspect.
Previous Message Michael Paquier 2022-04-26 05:36:05 Re: pgsql: Add contrib/pg_walinspect.

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-04-27 00:10:40 Re: WIP: WAL prefetch (another approach)
Previous Message Robert Haas 2022-04-26 20:26:08 Re: Building Postgres with lz4 on Visual Studio