Re: Amcheck: do rightlink verification with lock coupling

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Amcheck: do rightlink verification with lock coupling
Date: 2020-01-11 02:49:33
Message-ID: CAH2-WzntedoYDfGFK_TwEOVuLQHhzOuubV996dHKmWkzA101WQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 10, 2020 at 5:45 PM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> Peter, any opinion on this proposed amcheck patch? In the other thread
> [1] you seemed to agree this is worth checking, and Alvaro's proposal to
> make this check optional seems like a reasonable compromise with respect
> to the locking.

It's a good idea, and it probably doesn't even need to be made
optional -- lock coupling to the right is safe on a primary, and
should also be safe on standbys (though I should triple check the REDO
routines to be sure). The patch only does lock coupling when it proves
necessary, which ought to only happen when there is a concurrent page
split, which ought to be infrequent. Maybe there is no need to
compromise.

I'm curious why Andrey's corruption problems were not detected by the
cross-page amcheck test, though. We compare the first non-pivot tuple
on the right sibling leaf page with the last one on the target page,
towards the end of bt_target_page_check() -- isn't that almost as good
as what you have here in practice? I probably would have added
something like this myself earlier, if I had reason to think that
verification would be a lot more effective that way.

To be clear, I believe that Andrey wrote this patch for a reason -- I
assume that it makes a noticeable and consistent difference. I would
like to gain a better understanding of why that was for my own
benefit, though. For example, it might be that page deletion was a
factor that made the test I mentioned less effective. I care about the
specifics.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-01-11 03:51:50 Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)
Previous Message Michael Paquier 2020-01-11 02:49:32 Re: Setting min/max TLS protocol in clientside libpq