Re: Amcheck: do rightlink verification with lock coupling

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Amcheck: do rightlink verification with lock coupling
Date: 2020-08-05 23:25:14
Message-ID: CAH2-WzkTRbmc0qgp_T9kNFAYBxvmzhm1Ek-5P_2XxNjay4y1-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 4, 2020 at 9:33 AM Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> BTW, reviewing this patch again I cannot understand why we verify link coherence only on leaf level but not for internal pages?
> I do not see any actual problems here.

Well, I thought that it might be a good idea to limit it to the leaf
level, based on the theory that we rarely couple locks on internal
page levels in general. But yeah, that's probably not a good enough
reason to avoid lock coupling on internal pages. It's probably better
to do it everywhere than to explain why we don't do it on the internal
level -- the explanation will probably be confusing. And even if there
was a performance issue, it could only happen when there are
concurrent internal page splits -- but those are supposed to be rare.

Attached is v4, which now checks internal pages (just like leaf
pages). The main other change in this revised version is that we make
the error raised by bt_index_check() match the error used in the old
bt_index_parent_check() case -- we always want to blame the current
target page when amcheck complains (technically the page we blame when
the invariant fails isn't strictly guaranteed to be quite the same
thing as the target, but it's close enough to not really matter in
reality). Other adjustments:

* Added _bt_checkpage() calls for buffers, as is standard practice in nbtree.

* Added protection against locking the same page a second time in the
event of a faulty sibling link -- we should avoid a self-deadlock in
the event of a page that is corrupt in just the wrong way.

* Updated obsolescent comments that claimed that we never couple
buffer locks in amcheck.

I would like to commit something like this in the next day or two.

Thoughts?

--
Peter Geoghegan

Attachment Content-Type Size
v4-0001-Add-amcheck-sibling-link-checks.patch application/octet-stream 7.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-08-05 23:55:49 PROC_IN_ANALYZE stillborn 13 years ago
Previous Message David Rowley 2020-08-05 22:59:56 Re: Replace remaining StrNCpy() by strlcpy()