From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Greg Stark <stark(at)mit(dot)edu> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Stephen Frost <sfrost(at)snowman(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, David Steele <david(at)pgmasters(dot)net>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WIP: WAL prefetch (another approach) |
Date: | 2021-12-18 00:38:13 |
Message-ID: | 1431256.1639787893@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Greg Stark <stark(at)mit(dot)edu> writes:
> But the bigger question is. Are we really concerned about this flaky
> problem? Is it worth investing time and money on? I can get money to
> go buy a G4 or G5 and spend some time on it. It just seems a bit...
> niche. But if it's a real bug that represents something broken on
> other architectures that just happens to be easier to trigger here it
> might be worthwhile.
TBH, I don't know. There seem to be three plausible explanations:
1. Flaky hardware in my unit.
2. Ancient macOS bug, as Andres suggested upthread.
3. Actual PG bug.
If it's #1 or #2 then we're just wasting our time here. I'm not
sure how to estimate the relative probabilities, but I suspect
#3 is the least likely of the lot.
FWIW, I did just reproduce the problem on that machine with current HEAD:
2021-12-17 18:40:40.293 EST [21369] FATAL: inconsistent page found, rel 1663/167772/2673, forknum 0, blkno 26
2021-12-17 18:40:40.293 EST [21369] CONTEXT: WAL redo at C/3DE3F658 for Btree/INSERT_LEAF: off 208; blkref #0: rel 1663/167772/2673, blk 26 FPW
2021-12-17 18:40:40.522 EST [21365] LOG: startup process (PID 21369) exited with exit code 1
That was after only five loops of the regression tests, so either
I got lucky or the failure probability has increased again.
In any case, it seems clear that the problem exists independently of
Munro's patches, so I don't really think this question should be
considered a blocker for those.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2021-12-18 01:34:16 | Re: Column Filtering in Logical Replication |
Previous Message | Greg Stark | 2021-12-17 23:55:50 | Re: WIP: WAL prefetch (another approach) |