Re: amcheck (B-Tree integrity checking tool)

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Subject: Re: amcheck (B-Tree integrity checking tool)
Date: 2016-03-11 21:31:57
Message-ID: CAM3SWZQKDsDQ_kgBffDqy9cQwJNZ7xeF2D2Q6VAQZ+sh8WojUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 10, 2016 at 9:18 AM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> I've looked at this patch today, mostly to educate myself, so this
> probably should not count as a full review. Anyway, the patch seems in
> excellent shape - it'd be great if all patches (including those written
> by me) had this level of comments/docs.

Thanks. I try. This patch is something I've been working on slowly and
steadily for over 2 years. It was time to see it through.

>> Note that no function currently checks that the index is consistent
>> with the heap, which would be very useful (that's probably how I'd
>> eventually target the heapam, actually).
>
> I'm afraid I don't understand what "target the heapam" means. Could you
> elaborate?

Just that that's how I'd make amcheck verify that the heap was sane,
if I was going to undertake that project. Or, I'd start there.

> I agree. It'd be nice to have a tool that also checks for data
> corruption the a lower level (e.g. that varlena headers are not
> corrupted etc.) but that seems like a task for some other tool.

I'm not sure that that's a task for another tool. I think that this
tool is scoped at detecting corruption, and that could work well for
the heap, too. There might be fewer interesting things to test about
the heap when indexes aren't involved. Following HOT chains through an
index, and verifying things like that about the heap as we go could
detect a lot of problematic cases.

> Can we come up with names that more clearly identify the difference
> between those two functions? I mean, _parent_ does not make it
> particularly obvious that the second function acquires exclusive lock
> and performs more thorough checks.

Dunno about that. It's defining characteristic is that it checks child
pages against their parent IMV. Things are not often defined in terms
of their locking requirements.

> This also applies to the name of the contrib module - it's not
> particularly obvious what "amcheck" unless you're familiar with the
> concept of access methods. Which is quite unlikely for regular users.
> Maybe something like "check_index" would be more illustrative?

I think that given the heap may be targeted in the future, amcheck is
more future-proof. I think Robert might have liked that name a year
ago or something. In general, I'm not too worried about what the
contrib module is called in the end.

> Well, I wouldn't count myself as an expert here, but do we actually need
> such protection? I mean, we only do such checks when holding an
> exclusive lock on the parent relation, no? And even if we don't vacuum
> can only remove entries from the pages - why should that cause
> violations of any invariants?

I think that a more worked out explanation for why the ExclusiveLock
is needed is appropriate. I meant to do that.

Basically, a heavier lock is needed because of page deletion by
VACUUM, which is the major source of complexity (much more than page
splits, I'd say). In general, the key space can be consolidated by
VACUUM in a way that breaks child page checking because the downlink
we followed from our target page is no longer the current downlink.
Page deletion deletes the right sibling's downlink, and the deleted
page's downlink is used for its sibling. You could have a race, where
there was a concurrent page deletion of the left sibling of the child
page, then a concurrent insertion into the newly expanded keyspace of
the parent. Therefore, the downlink in the parent (which is the
"target", to use the patch's terminology) would not be a lower bound
on items in the page.

That's a very low probability race, because it involves deletion a
page representing a range in the keyspace that has no live items, but
then immediately getting one just in time for our check. But, I'm
pretty determined that false positives like that need to be impossible
(or else it's a bug).

I have a paranoid feeling that there is a similar very low probability
race with the left-right keyspace checks, which don't have relation
ExclusiveLock protection (IOW, I think that that might be buggy). I
need to think about that some more, but current thinking is that it
would hardly matter if we used the highkey from right page rather than
the first data item, which definitely would be correct. And it would
be simpler.

> A few minor comments:

Thanks for catching those issues. Will fix.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2016-03-11 21:32:53 Re: [COMMITTERS] pgsql: Provide much better wait information in pg_stat_activity.
Previous Message Robert Haas 2016-03-11 21:29:39 Re: Relation extension scalability