Re: new heapcheck contrib module

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new heapcheck contrib module
Date: 2020-05-13 19:21:49
Message-ID: CA+TgmoaZKW3Qgz25fS0P5ygMjvy_J-tyv=mdG-zSYGJytjk_AQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 12, 2020 at 11:06 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> try/catch seems like the way to do it. Not all amcheck errors come
> from amcheck -- some are things that the backend code does, that are
> known to appear in amcheck from time to time. I'm thinking in
> particular of the
> table_index_build_scan()/heapam_index_build_range_scan() errors, as
> well as the errors from _bt_checkpage().

That would require the use of a subtransaction.

> You've quasi-duplicated some sensitive code to do that much, which
> seems excessive. But it's also not enough.

I think this is a good summary of the problems in this area. On the
one hand, I think it's hideous that we sanity check user input to
death, but blindly trust the bytes on disk to the point of seg
faulting if they're wrong. The idea that int4 + int4 has to have
overflow checking because otherwise a user might be sad when they get
a negative result from adding two negative numbers, while at the same
time supposing that the same user will be unwilling to accept the
performance hit to avoid crashing if they have a bad tuple, is quite
suspect in my mind. The overflow checking is also expensive, but we do
it because it's the right thing to do, and then we try to minimize the
overhead. It is unclear to me why we shouldn't also take that approach
with bytes that come from disk. In particular, using Assert() checks
for such things instead of elog() is basically Assert(there is no such
thing as a corrupted database).

On the other hand, that problem is clearly way above this patch's pay
grade. There's a lot of stuff all over the code base that would have
to be changed to fix it. It can't be done as an incidental thing as
part of this patch or any other. It's a massive effort unto itself. We
need to somehow draw a clean line between what this patch does and
what it does not do, such that the scope of this patch remains
something achievable. Otherwise, we'll end up with nothing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-05-13 19:32:33 Re: Add "-Wimplicit-fallthrough" to default flags
Previous Message Robert Haas 2020-05-13 19:10:30 Re: tablespace_map code cleanup