Re: new heapcheck contrib module

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new heapcheck contrib module
Date: 2020-07-31 01:38:43
Message-ID: 142B4FB1-C7ED-43E4-95E3-FF80DC26F3A6@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Jul 30, 2020, at 5:53 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Thu, Jul 30, 2020 at 6:10 PM Mark Dilger
> <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>> No, that wasn't my concern. I was thinking about CLOG entries disappearing during the scan as a consequence of concurrent vacuums, and the effect that would have on the validity of the cached [relfrozenxid..next_valid_xid] range. In the absence of corruption, I don't immediately see how this would cause any problems. But for a corrupt table, I'm less certain how it would play out.
>
> Oh, hmm. I wasn't thinking about that problem. I think the only way
> this can happen is if we read a page and then, before we try to look
> up the CID, vacuum zooms past, finishes the whole table, and truncates
> clog. But if that's possible, then it seems like it would be an issue
> for SELECT as well, and it apparently isn't, or we would've done
> something about it by now. I think the reason it's not possible is
> because of the locking rules described in
> src/backend/storage/buffer/README, which require that you hold a
> buffer lock until you've determined that the tuple is visible. Since
> you hold a share lock on the buffer, a VACUUM that hasn't already
> processed that freeze the tuples in that buffer; it would need an
> exclusive lock on the buffer to do that. Therefore it can't finish and
> truncate clog either.
>
> Now, you raise the question of whether this is still true if the table
> is corrupt, but I don't really see why that makes any difference.
> VACUUM is supposed to freeze each page it encounters, to the extent
> that such freezing is necessary, and with Andres's changes, it's
> supposed to ERROR out if things are messed up. We can postulate a bug
> in that logic, but inserting a VACUUM-blocking lock into this tool to
> guard against a hypothetical vacuum bug seems strange to me. Why would
> the right solution not be to fix such a bug if and when we find that
> there is one?

Since I can't think of a plausible concrete example of corruption which would elicit the problem I was worrying about, I'll withdraw the argument. But that leaves me wondering about a comment that Andres made upthread:

> On Apr 20, 2020, at 12:42 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> I don't think random interspersed uses of CLogTruncationLock are a good
> idea. If you move to only checking visibility after tuple fits into
> [relfrozenxid, nextXid), then you don't need to take any locks here, as
> long as a lock against vacuum is taken (which I think this should do
> anyway).


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2020-07-31 01:39:35 Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)
Previous Message Peter Geoghegan 2020-07-31 01:33:32 Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)