Re: WIP: WAL prefetch (another approach)

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, Stephen Frost <sfrost(at)snowman(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, David Steele <david(at)pgmasters(dot)net>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: WAL prefetch (another approach)
Date: 2021-11-27 01:47:01
Message-ID: CA+hUKGKxvj8g1oL7iGaywoe0E-bNSbSPVtSWZ05CnUNWJzEJtw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 27, 2021 at 12:34 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> One thing that's not clear to me is what happened to the reasons why
> this feature was reverted in the PG14 cycle?

Reasons for reverting:

1. A bug in commit 323cbe7c, "Remove read_page callback from
XLogReader.". I couldn't easily revert just that piece. This new
version doesn't depend on that change anymore, to try to keep things
simple. (That particular bug has been fixed in a newer version of
that patch[1], which I still think was a good idea incidentally.)
2. A bug where allocation for large records happened before
validation. Concretely, you can see that this patch does
XLogReadRecordAlloc() after validating the header (usually, same as
master), but commit f003d9f8 did it first. (Though Andres pointed
out[2] that more work is needed on that to make that logic more
robust, and I'm keen to look into that, but that's independent of this
work).
3. A wild goose chase for bugs on Tom Lane's antique 32 bit PPC
machine. Tom eventually reproduced it with the patches reverted,
which seemed to exonerate them but didn't leave a good feeling: what
was happening, and why did the patches hugely increase the likelihood
of the failure mode? I have no new information on that, but I know
that several people spent a huge amount of time and effort trying to
reproduce it on various types of systems, as did I, so despite not
reaching a conclusion of a bug, this certainly contributed to a
feeling that the patch had run out of steam for the 14 cycle.

This week I'll have another crack at getting that TAP test I proposed
that runs the regression tests with a streaming replica to work on
Windows. That does approximately what Tom was doing when he saw
problem #3, which I'd like to have as standard across the build farm.

[1] https://www.postgresql.org/message-id/20211007.172820.1874635561738958207.horikyota.ntt%40gmail.com
[2] https://www.postgresql.org/message-id/20210505010835.umylslxgq4a6rbwg%40alap3.anarazel.de

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-11-27 01:55:17 Re: pgsql: xlog.c: Remove global variables ReadRecPtr and EndRecPtr.
Previous Message Tomas Vondra 2021-11-26 23:34:21 Re: WIP: WAL prefetch (another approach)