Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

From: Noah Misch <noah(at)leadboat(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, Maxim Orlov <m(dot)orlov(at)postgrespro(dot)ru>, lubennikovaav(at)gmail(dot)com, Hamid Akhtar <hamid(dot)akhtar(at)percona(dot)com>
Subject: Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
Date: 2024-03-25 02:03:23
Message-ID: 20240325020323.fd.nmisch@google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 30, 2023 at 11:29:04AM +0400, Pavel Borisov wrote:
> On Wed, 25 Oct 2023 at 00:13, Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> > I think this patch is ready to go. I'm going to push it if there are
> > no objections.

> It's very good that this long-standing patch is finally committed. Thanks a lot!

Agreed. I gave this feature (commit 5ae2087) a try. Thanks for implementing
it. Could I get your input on two topics?

==== 1. Cross-page comparison at "first key on the next page" only

Cross-page comparisons got this discussion upthread:

On Tue, Mar 02, 2021 at 07:10:32PM -0800, Peter Geoghegan wrote:
> On Mon, Feb 8, 2021 at 2:46 AM Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com> wrote:
> > Caveat: if the first entry on the next index page has a key equal to the key on a previous page AND all heap tid's corresponding to this entry are invisible, currently cross-page check can not detect unique constraint violation between previous index page entry and 2nd, 3d and next current index page entries. In this case, there would be a message that recommends doing VACUUM to remove the invisible entries from the index and repeat the check. (Generally, it is recommended to do vacuum before the check, but for the testing purpose I'd recommend turning it off to check the detection of visible-invisible-visible duplicates scenarios)

> You're going to have to "couple" buffer locks in the style of
> _bt_check_unique() (as well as keeping a buffer lock on "the first
> leaf page a duplicate might be on" throughout) if you need the test to
> work reliably.

The amcheck feature has no lock coupling at its "first key on the next page"
check. I think that's fine, because amcheck takes one snapshot at the
beginning and looks for pairs of visible-to-that-snapshot heap tuples with the
same scan key. _bt_check_unique(), unlike amcheck, must catch concurrent
inserts. If amcheck "checkunique" wanted to detect duplicates that would
appear when all transactions commit, it would need lock coupling. (I'm not
suggesting it do that.) Do you see a problem with the lack of lock coupling
at "first key on the next page"?

> But why bother with that? The tool doesn't have to be
> 100% perfect at detecting corruption (nothing can be), and it's rather
> unlikely that it will matter for this test. A simple test that doesn't
> handle cross-page duplicates is still going to be very effective.

I agree, but perhaps the "first key on the next page" code is more complex
than general-case code would be. If the lack of lock coupling is fine, then I
think memory context lifecycle is the only obstacle making index page
boundaries special. Are there factors beyond that? We already have
state->lowkey kept across pages via MemoryContextAlloc(). Similar lines of
code could preserve the scan key for checkunique, making the "first key on the
next page" code unnecessary.

==== 2. Raises runtime by 476% despite no dead tuples

I used the following to create a table larger than RAM, 17GB table and 10GB
index on a system with 12GB RAM:

\set count 500000000
begin;
set maintenance_work_mem = '1GB';
set client_min_messages = debug1; -- debug2 is per-block spam
create temp table t as select n from generate_series(1,:count) t(n);
create unique index t_idx on t(n);
\dt+ t
\di+ t_idx
create extension amcheck;
select bt_index_check('t_idx', heapallindexed => false, checkunique => false);
select bt_index_check('t_idx', heapallindexed => false, checkunique => true);

Adding checkunique raised runtime from 58s to 276s, because it checks
visibility for every heap tuple. It could do the heap fetch and visibility
check lazily, when the index yields two heap TIDs for one scan key. That
should give zero visibility checks for this particular test case, and it
doesn't add visibility checks to bloated-table cases. Pseudo-code:

/*---
* scan_key is the last uniqueness-relevant scan key observed as
* bt_check_level_from_leftmost() moves right to traverse the leaf level.
* Will be NULL if the next tuple can't be the second tuple of a
* uniqueness violation, because any of the following apply:
* - we're evaluating the first leaf tuple of the entire index
* - last scan key had anynullkeys (never forms a uniqueness violation w/
* any other scan key)
*/
scan_key = NULL;
/*
* scan_key_known_visible==true indicates that scan_key_heap_tid is the
* last _visible_ heap TID observed for scan_key. Otherwise,
* scan_key_heap_tid is the last heap TID observed for scan_key, and we've
* not yet checked its visibility.
*/
bool scan_key_known_visible;
scan_key_heap_tid;
foreach itup (leftmost_leaf_level_tup .. rightmost_leaf_level_tup) {
if (itup.anynullkeys)
scan_key = NULL;
else if (scan_key != NULL &&
_bt_compare(scan_key, itup.key) == 0 &&
(scan_key_known_visible ||
(scan_key_known_visible = visible(scan_key_heap_tid))))
{
if (visible(itup.tid))
elog(ERROR, "duplicate in unique index");
}
else
{
/*
* No prior uniqueness-relevant key, or key changed, or we just
* learned scan_key_heap_tid was invisible. Make itup the
* standard by which we judge future index tuples as we move
* right.
*/
scan_key = itup.key;
scan_key_known_visible = false;
scan_key_heap_tid = itup.tid;
}
}

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-03-25 02:06:58 Re: Add Index-level REINDEX with multiple jobs
Previous Message David Rowley 2024-03-25 01:43:54 Re: Properly pathify the union planner