Re: GIN data corruption bug(s) in 9.6devel

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN data corruption bug(s) in 9.6devel
Date: 2016-04-17 20:03:34
Message-ID: CAMkU=1xEbup-ARaGOd2AQampJeZApq5WGNF9JHc86fw1CnJmtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 12, 2016 at 9:53 AM, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:
>
> With pending cleanup patch backend will try to get lock on metapage with
> ConditionalLockPage. Will it interrupt autovacum worker?

Correct, ConditionalLockPage should not interrupt the autovacuum worker.

>>
>> Alvaro's recommendation, to let the cleaner off the hook once it
>> passes the page which was the tail page at the time it started, would
>> prevent any process from getting pinned down indefinitely, but would
>> not prevent the size of the list from increasing without bound. I
>> think that would probably be good enough, because the current
>> throttling behavior is purely accidentally and doesn't *guarantee* a
>> limit on the size of the pending list.
>
> Added, see attached patch (based on v3.1)

With this applied, I am getting a couple errors I have not seen before
after extensive crash recovery testing:

ERROR: attempted to delete invisible tuple

ERROR: unexpected chunk number 1 (expected 2) for toast value
100338365 in pg_toast_16425

I've restarted the test harness with intentional crashes turned off,
to see if the problems are related to crash recovery or are more
generic than that.

I've never seen these particular problems before, so don't have much
insight into what might be going on or how to debug it.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Terence Ferraro 2016-04-17 20:45:28 SSL certificate location
Previous Message Bill Moran 2016-04-17 19:28:30 Can we improve this error message?