Re: WIP: WAL prefetch (another approach)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Stephen Frost <sfrost(at)snowman(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, David Steele <david(at)pgmasters(dot)net>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: WAL prefetch (another approach)
Date: 2021-12-18 00:38:13
Message-ID: 1431256.1639787893@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Stark <stark(at)mit(dot)edu> writes:
> But the bigger question is. Are we really concerned about this flaky
> problem? Is it worth investing time and money on? I can get money to
> go buy a G4 or G5 and spend some time on it. It just seems a bit...
> niche. But if it's a real bug that represents something broken on
> other architectures that just happens to be easier to trigger here it
> might be worthwhile.

TBH, I don't know. There seem to be three plausible explanations:

1. Flaky hardware in my unit.
2. Ancient macOS bug, as Andres suggested upthread.
3. Actual PG bug.

If it's #1 or #2 then we're just wasting our time here. I'm not
sure how to estimate the relative probabilities, but I suspect
#3 is the least likely of the lot.

FWIW, I did just reproduce the problem on that machine with current HEAD:

2021-12-17 18:40:40.293 EST [21369] FATAL: inconsistent page found, rel 1663/167772/2673, forknum 0, blkno 26
2021-12-17 18:40:40.293 EST [21369] CONTEXT: WAL redo at C/3DE3F658 for Btree/INSERT_LEAF: off 208; blkref #0: rel 1663/167772/2673, blk 26 FPW
2021-12-17 18:40:40.522 EST [21365] LOG: startup process (PID 21369) exited with exit code 1

That was after only five loops of the regression tests, so either
I got lucky or the failure probability has increased again.

In any case, it seems clear that the problem exists independently of
Munro's patches, so I don't really think this question should be
considered a blocker for those.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-12-18 01:34:16 Re: Column Filtering in Logical Replication
Previous Message Greg Stark 2021-12-17 23:55:50 Re: WIP: WAL prefetch (another approach)