From: | Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: new heapcheck contrib module |
Date: | 2020-07-31 01:38:43 |
Message-ID: | 142B4FB1-C7ED-43E4-95E3-FF80DC26F3A6@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On Jul 30, 2020, at 5:53 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Thu, Jul 30, 2020 at 6:10 PM Mark Dilger
> <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>> No, that wasn't my concern. I was thinking about CLOG entries disappearing during the scan as a consequence of concurrent vacuums, and the effect that would have on the validity of the cached [relfrozenxid..next_valid_xid] range. In the absence of corruption, I don't immediately see how this would cause any problems. But for a corrupt table, I'm less certain how it would play out.
>
> Oh, hmm. I wasn't thinking about that problem. I think the only way
> this can happen is if we read a page and then, before we try to look
> up the CID, vacuum zooms past, finishes the whole table, and truncates
> clog. But if that's possible, then it seems like it would be an issue
> for SELECT as well, and it apparently isn't, or we would've done
> something about it by now. I think the reason it's not possible is
> because of the locking rules described in
> src/backend/storage/buffer/README, which require that you hold a
> buffer lock until you've determined that the tuple is visible. Since
> you hold a share lock on the buffer, a VACUUM that hasn't already
> processed that freeze the tuples in that buffer; it would need an
> exclusive lock on the buffer to do that. Therefore it can't finish and
> truncate clog either.
>
> Now, you raise the question of whether this is still true if the table
> is corrupt, but I don't really see why that makes any difference.
> VACUUM is supposed to freeze each page it encounters, to the extent
> that such freezing is necessary, and with Andres's changes, it's
> supposed to ERROR out if things are messed up. We can postulate a bug
> in that logic, but inserting a VACUUM-blocking lock into this tool to
> guard against a hypothetical vacuum bug seems strange to me. Why would
> the right solution not be to fix such a bug if and when we find that
> there is one?
Since I can't think of a plausible concrete example of corruption which would elicit the problem I was worrying about, I'll withdraw the argument. But that leaves me wondering about a comment that Andres made upthread:
> On Apr 20, 2020, at 12:42 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> I don't think random interspersed uses of CLogTruncationLock are a good
> idea. If you move to only checking visibility after tuple fits into
> [relfrozenxid, nextXid), then you don't need to take any locks here, as
> long as a lock against vacuum is taken (which I think this should do
> anyway).
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | James Coleman | 2020-07-31 01:39:35 | Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort) |
Previous Message | Peter Geoghegan | 2020-07-31 01:33:32 | Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort) |