Quick Links

Re: new heapcheck contrib module

From:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: new heapcheck contrib module
Date:	2020-07-31 01:38:43
Message-ID:	142B4FB1-C7ED-43E4-95E3-FF80DC26F3A6@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> On Jul 30, 2020, at 5:53 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Thu, Jul 30, 2020 at 6:10 PM Mark Dilger
> <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>> No, that wasn't my concern. I was thinking about CLOG entries disappearing during the scan as a consequence of concurrent vacuums, and the effect that would have on the validity of the cached [relfrozenxid..next_valid_xid] range. In the absence of corruption, I don't immediately see how this would cause any problems. But for a corrupt table, I'm less certain how it would play out.
>
> Oh, hmm. I wasn't thinking about that problem. I think the only way
> this can happen is if we read a page and then, before we try to look
> up the CID, vacuum zooms past, finishes the whole table, and truncates
> clog. But if that's possible, then it seems like it would be an issue
> for SELECT as well, and it apparently isn't, or we would've done
> something about it by now. I think the reason it's not possible is
> because of the locking rules described in
> src/backend/storage/buffer/README, which require that you hold a
> buffer lock until you've determined that the tuple is visible. Since
> you hold a share lock on the buffer, a VACUUM that hasn't already
> processed that freeze the tuples in that buffer; it would need an
> exclusive lock on the buffer to do that. Therefore it can't finish and
> truncate clog either.
>
> Now, you raise the question of whether this is still true if the table
> is corrupt, but I don't really see why that makes any difference.
> VACUUM is supposed to freeze each page it encounters, to the extent
> that such freezing is necessary, and with Andres's changes, it's
> supposed to ERROR out if things are messed up. We can postulate a bug
> in that logic, but inserting a VACUUM-blocking lock into this tool to
> guard against a hypothetical vacuum bug seems strange to me. Why would
> the right solution not be to fix such a bug if and when we find that
> there is one?

Since I can't think of a plausible concrete example of corruption which would elicit the problem I was worrying about, I'll withdraw the argument. But that leaves me wondering about a comment that Andres made upthread:

> On Apr 20, 2020, at 12:42 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> I don't think random interspersed uses of CLogTruncationLock are a good
> idea. If you move to only checking visibility after tuple fits into
> [relfrozenxid, nextXid), then you don't need to take any locks here, as
> long as a lock against vacuum is taken (which I think this should do
> anyway).

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: new heapcheck contrib module at 2020-07-31 00:53:32 from Robert Haas

Responses

Re: new heapcheck contrib module at 2020-07-31 12:02:55 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	James Coleman	2020-07-31 01:39:35	Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)
Previous Message	Peter Geoghegan	2020-07-31 01:33:32	Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)