Re: BUG #17245: Index corruption involving deduplicated entries

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Kamigishi Rei <iijima(dot)yun(at)koumakan(dot)jp>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: BUG #17245: Index corruption involving deduplicated entries
Date: 2021-10-30 22:11:18
Message-ID: CAH2-Wzk-4_raTzawWGaiqNvkpwDXxv3y1AQhQyUeHfkU=tFCeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sat, Oct 30, 2021 at 1:42 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> I think it probably is worth adding an error check someplace that verifies
> that problems of this kind will be detected with, uh, less effort.

The draft fix is something I wanted to get out quickly to signal that
the bug is definitely going to be fixed soon. I'll need to carefully
think about this some more on Monday, to make sure that it becomes
much harder to break in roughly the same way once again.

> I think it'd also be good to add a test that specifically verifies that
> parallel vacuum doesn't have a bug around "parallel worthy" and not "parallel
> worthy" indexes. It's too easy a mistake to make, and because visible
> corruption is delayed, it's likely that we won't detect such cases.

I agree.

> ISTM that at least a basic version of this is worth doing as a check throwing
> an ERROR, rather than an assertion. It's hard to believe this'd be a
> significant portion of the cost of heap_index_delete_tuples(), and I think it
> would help catch problems a lot earlier.

I like the idea of doing that, but only on master. I think that we
could do this as an ERROR, provided we avoid adding a new inner loop
to heap_index_delete_tuples() --- we can definitely afford that. And
we can even have a nice, detailed error message that blames a
particular tuple from a specific index page, pointing to a specific
heap TID.

Separately, we should add an assertion that catches cases where a TID
in the index points to an LP_REDIRECT line pointer, which does not
point to a heap tuple with storage. That check has much less practical
value, and necessitates adding a new inner loop, which is why an
assert seems good enough to me. (The patch I've posted doesn't have
any of this LP_REDIRECT business, but my v2 revision will.)

--
Peter Geoghegan

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2021-10-30 22:18:24 Re: BUG #17245: Index corruption involving deduplicated entries
Previous Message Peter Geoghegan 2021-10-30 21:46:13 Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum