Re: new heapcheck contrib module

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new heapcheck contrib module
Date: 2020-10-08 01:42:00
Message-ID: CAH2-Wzn38UhrmZomiF_FroR=WUYy7hNx1grmrPRggPnTnxzVRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 5, 2020 at 5:24 PM Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> > I don't see how verify_heapam will avoid raising an error during basic
> > validation from PageIsVerified(), which will violate the guarantee
> > about not throwing errors. I don't see that as a problem myself, but
> > presumably you will.
>
> My concern is not so much that verify_heapam will stop with an error, but rather that it might trigger a panic that stops all backends. Stopping with an error merely because it hits corruption is not ideal, as I would rather it completed the scan and reported all corruptions found, but that's minor compared to the damage done if verify_heapam creates downtime in a production environment offering high availability guarantees. That statement might seem nuts, given that the corrupt table itself would be causing downtime, but that analysis depends on assumptions about table access patterns, and there is no a priori reason to think that corrupt pages are necessarily ever being accessed, or accessed in a way that causes crashes (rather than merely wrong results) outside verify_heapam scanning the whole table.

That seems reasonable to me. I think that it makes sense to never take
down the server in a non-debug build with verify_heapam. That's not
what I took away from your previous remarks on the issue, but perhaps
it doesn't matter now.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message k.jamison@fujitsu.com 2020-10-08 02:07:06 RE: [Patch] Optimize dropping of relation buffers using dlist
Previous Message Michael Paquier 2020-10-08 01:35:41 Re: [PATCH] ecpg: fix progname memory leak