From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Zhang Mingli <zmlpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax) |
Date: | 2022-09-12 02:10:47 |
Message-ID: | CAH2-WzmRLLqhBTRBg5WuMt4ShDygzKP4V7PiO23uQdDxwF18SQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Sep 11, 2022 at 6:42 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> I think you're saying is that this can be explained by the
> io_concurrency bug in recovery_prefetch, if run under 15b3.
>
> But yesterday I started from initdb and restored this cluster from backup, and
> started up sqlsmith, and sent some kill -9, and now got more corruption.
> Looks like it took ~10 induced crashes before this happened.
Have you tested fsync on the system?
The symptoms here are all over the place. This assertion failure seems
like a pretty good sign that the problems happen during recovery, or
because basic guarantees needed by for crash safety aren't met:
> #2 0x0000000000962c5c in ExceptionalCondition (conditionName=conditionName(at)entry=0x9ce238 "P_ISLEAF(opaque) && !P_ISDELETED(opaque)", errorType=errorType(at)entry=0x9bad97 "FailedAssertion",
> fileName=fileName(at)entry=0x9cdcd1 "nbtpage.c", lineNumber=lineNumber(at)entry=1778) at assert.c:69
> #3 0x0000000000507e34 in _bt_rightsib_halfdeadflag (rel=rel(at)entry=0x7f4138a238a8, leafrightsib=leafrightsib(at)entry=53) at nbtpage.c:1778
> #4 0x0000000000507fba in _bt_mark_page_halfdead (rel=rel(at)entry=0x7f4138a238a8, leafbuf=leafbuf(at)entry=13637, stack=stack(at)entry=0x144ca20) at nbtpage.c:2121
This shows that the basic rules for page deletion have somehow
seemingly been violated. It's as if a page deletion went ahead, but
didn't work as an atomic operation -- there were some lost writes for
some but not all pages. Actually, it looks like a mix of states from
before and after both the first and the second phases of page deletion
-- so not just one atomic operation.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Munro | 2022-09-12 02:25:48 | Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax) |
Previous Message | Justin Pryzby | 2022-09-12 01:42:35 | Re: pg15b4: FailedAssertion("TransactionIdIsValid(xmax) |