Re: new heapcheck contrib module

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new heapcheck contrib module
Date: 2020-11-19 21:50:33
Message-ID: 1B0E97CE-DEC4-4A9F-BF8F-38822A2840F5@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Nov 19, 2020, at 11:47 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
>> I think in general you're worrying too much about the possibility of
>> this tool causing backend crashes. I think it's good that you wrote
>> the heapcheck code in a way that's hardened against that, and I think
>> we should try to harden other things as time permits. But I don't
>> think that the remote possibility of a crash due to the lack of such
>> hardening should dictate the design behavior of this tool. If the
>> crash possibilities are not remote, then I think the solution is to
>> fix them, rather than cutting out important checks.
>
> I couldn't agree more.

Owing to how much run-time overhead it would entail, much of the backend code has not been, and probably will not be, hardened against corruption. The amcheck code uses backend code for accessing heaps and indexes. Only some of those uses can be preceded with sufficient safety checks to avoid stepping on landmines. It makes sense to me to have a "don't run through minefields" option, and a "go ahead, run through minefields" option for pg_amcheck, given that users in differing situations will have differing business consequences to bringing down the server in question.

As an example that we've already looked at, checking the status of an xid against clog is a dangerous thing to do. I wrote a patch to make it safer to query clog (0003) and a patch for pg_amcheck to use the safer interface (0004) and it looks unlikely either of those will ever be committed. I doubt other backend hardening is any more likely to get committed. It doesn't follow that if crash possibilities are not remote that we should therefore harden the backend. The performance considerations of the backend are not well aligned with the safety considerations of this tool. The backend code is written with the assumption of non-corrupt data, and this tool with the assumption of corrupt data, or at least a fair probability of corrupt data. I don't see how any one-hardening-fits-all will ever work.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2020-11-19 22:00:11 Re: ERROR: too many dynamic shared memory segment
Previous Message Robert Haas 2020-11-19 21:35:12 Re: new heapcheck contrib module