Re: new heapcheck contrib module

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new heapcheck contrib module
Date: 2020-07-31 00:53:32
Message-ID: CA+TgmobtWL-s87WaAgsQ01AOnsen_iXmMpRUMUEc-nR32cvyhA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 30, 2020 at 6:10 PM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> No, that wasn't my concern. I was thinking about CLOG entries disappearing during the scan as a consequence of concurrent vacuums, and the effect that would have on the validity of the cached [relfrozenxid..next_valid_xid] range. In the absence of corruption, I don't immediately see how this would cause any problems. But for a corrupt table, I'm less certain how it would play out.

Oh, hmm. I wasn't thinking about that problem. I think the only way
this can happen is if we read a page and then, before we try to look
up the CID, vacuum zooms past, finishes the whole table, and truncates
clog. But if that's possible, then it seems like it would be an issue
for SELECT as well, and it apparently isn't, or we would've done
something about it by now. I think the reason it's not possible is
because of the locking rules described in
src/backend/storage/buffer/README, which require that you hold a
buffer lock until you've determined that the tuple is visible. Since
you hold a share lock on the buffer, a VACUUM that hasn't already
processed that freeze the tuples in that buffer; it would need an
exclusive lock on the buffer to do that. Therefore it can't finish and
truncate clog either.

Now, you raise the question of whether this is still true if the table
is corrupt, but I don't really see why that makes any difference.
VACUUM is supposed to freeze each page it encounters, to the extent
that such freezing is necessary, and with Andres's changes, it's
supposed to ERROR out if things are messed up. We can postulate a bug
in that logic, but inserting a VACUUM-blocking lock into this tool to
guard against a hypothetical vacuum bug seems strange to me. Why would
the right solution not be to fix such a bug if and when we find that
there is one?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-07-31 01:25:22 Re: Missing CFI in hlCover()?
Previous Message Tom Lane 2020-07-31 00:32:37 Re: proposal: type info support functions for functions that use "any" type