Re: WIP: WAL prefetch (another approach)

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: WAL prefetch (another approach)
Date: 2021-04-21 20:16:43
Message-ID: CA+hUKGLQqsNBXXZ6uXFi7L9iqx-4ZSuiAvKiPqDtRQs+yFD9ew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 22, 2021 at 8:07 AM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> On 4/21/21 6:30 PM, Tom Lane wrote:
> > Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> >> Yeah, it would have been nice to include that but it'll have to be for
> >> v15 due to lack of time to convince myself that it was correct. I do
> >> intend to look into more concurrency of that kind for v15. I have
> >> pushed these patches, updated to be disabled by default.
> >
> > I have a fairly bad feeling about these patches. I've already fixed
> > one critical bug (see 9e4114822), but I am still seeing random, hard
> > to reproduce failures in WAL replay testing. It looks like sometimes
> > the "decoded" version of a WAL record doesn't match what I see in
> > the on-disk data, which I'm having no luck tracing down.

Ugh. Looking into this now. Also, this week I have been researching
a possible problem with eg ALTER TABLE SET TABLESPACE in the higher
level patch, which I'll write about soon.

> > I am not sure whether the checksum failure itself is real or a variant
> > of the seeming bad-reconstruction problem, but what I'm on about right
> > at this moment is that the error handling logic for this case seems
> > quite broken. Why is a checksum failure only worthy of a LOG message?
> > Why is ValidXLogRecord() issuing a log message for itself, rather than
> > being tied into the report_invalid_record() mechanism? Why are we
> > evidently still trying to decode records afterwards?
>
> Yeah, that seems suspicious.

I may have invited trouble by deciding to rebase on the other proposal
late in the cycle. That interfaces around there.

> > In general, I'm not too pleased with the apparent attitude in this
> > thread that it's okay to push a patch that only mostly works on the
> > last day of the dev cycle and plan to stabilize it later.
>
> Was there such attitude? I don't think people were arguing for pushing a
> patch's not working correctly. The discussion was mostly about getting
> it committed even and leaving some optimizations for v15.

That wasn't my plan, but I admit that the timing was non-ideal. In
any case, I'll dig into these failures and then consider options.
More soon.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2021-04-21 20:24:35 Re: Privilege boundary between sysadmin and database superuser [Was: Re: pg_amcheck option to install extension]
Previous Message Tomas Vondra 2021-04-21 20:07:43 Re: WIP: WAL prefetch (another approach)