Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
Cc: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
Date: 2021-03-01 21:05:03
Message-ID: 6FED9A6F-9AB3-4711-A5F7-55535ADF736B@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Mar 1, 2021, at 12:23 PM, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com> wrote:
>
> If bt_index_check() and bt_index_parent_check() are to have this functionality, shouldn't there be an option controlling it much as the option (heapallindexed boolean) controls checking whether all entries in the heap are indexed in the btree? It seems inconsistent to have an option to avoid checking the heap for that, but not for this. Alternately, this might even be better coded as its own function, named something like bt_unique_index_check() perhaps. I hope Peter might advise?
>
> As for heap checking, my reasoning was that we can not check whether a unique constraint violated by the index, without checking heap tuple visibility. I.e. we can have many identical index entries without uniqueness violated if only one of them corresponds to a visible heap tuple. So heap checking included in my patch is _necessary_ for unique constraint checking, it should not have an option to be disabled, otherwise, the only answer we can get is that unique constraint MAY be violated and may not be, which is quite useless. If you meant something different, please elaborate.

I completely agree that checking uniqueness requires looking at the heap, but I don't agree that every caller of bt_index_check on an index wants that particular check to be performed. There are multiple ways in which an index might be corrupt, and Peter wrote the code to only check some of them by default, with options to expand the checks to other things. This is why heapallindexed is optional. If you don't want to pay the price of checking all entries in the heap against the btree, you don't have to.

I'm not against running uniqueness checks on unique indexes. It seems fairly normal that a user would want that. Perhaps the option should default to 'true' if unspecified? But having no option at all seems to run contrary to how the other functionality is structured.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2021-03-01 21:05:08 Re: Improvements in prepared statements
Previous Message Tomas Vondra 2021-03-01 20:49:35 Re: Use extended statistics to estimate (Var op Var) clauses