Re: Do not check unlogged indexes on standby

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Do not check unlogged indexes on standby
Date: 2019-08-13 17:30:58
Message-ID: CAH2-Wzmcut1uaTZ5L=NqC+mTQyQq43m_etZqcTHjM5tanmCKMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 13, 2019 at 5:17 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> We have a bunch of internal testing HA clusters that suffered from corruption conditions.
> We fixed everything that can be detected with parent-check on primaries or usual check on standbys.
> (page updates were lost both on primary and during WAL replay)
> But from time to time when clusters switch primary from one availability zone to another we observe
> "right sibling's left-link doesn't match: block 32709 links to 37022 instead of expected 40953 in index"

That sounds like an issue caused by a failure to replay all available
WAL, where only one page happened to get written out by a checkpoint
before a crash. It's something like that. That wouldn't be caught by
the cross-page bt_index_check() check that we do already.

> We are going to search for these clusters with this [0] tolerating possible fraction of false positives, we have them anyway.
> But I think I could put some effort into making corruption-detection tooling better.
> I think if we observe links discrepancy, we can acquire lock of left and right pages and recheck.

That's one possibility. When I first designed amcheck it was important
to be conservative, so I invented a general rule about never acquiring
multiple buffer locks at once. I still think that that was the correct
decision for the bt_downlink_check() check (the main extra
bt_index_parent_check() check), but I think that you're right about
retrying to verify the sibling links when bt_index_check() is called
from SQL.

nbtree will often "couple" buffer locks on the leaf level; it will
acquire a lock on a leaf page, and not release that lock until it has
also acquired a lock on the right sibling page (I'm mostly thinking of
_bt_stepright()). I am in favor of a patch that makes amcheck perform
sibling link verification within bt_index_check(), by retrying while
pessimistically coupling buffer locks. (Though I think that that
should just happen on the leaf level. We should not try to be too
clever about ignorable/half-dead/deleted pages, to be conservative.)

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-08-13 18:27:24 Re: Unix-domain socket support on Windows
Previous Message Bruce Momjian 2019-08-13 16:52:35 Re: pg_upgrade fails with non-standard ACL