Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

From: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
To: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
Date: 2021-02-08 19:31:13
Message-ID: CALT9ZEHt72CJ83Da7UXy_+4h_i_i8p+XWotXfnKrA+nJ5NPuXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 8 Feb 2021, 14:46 Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com wrote:

> Hi, hackers!
>
> It seems that if btree index with a unique constraint is corrupted by
> duplicates, amcheck now can not catch this. Reindex becomes impossible as
> it throws an error but otherwise the index will let the user know that it
> is corrupted, and amcheck will tell that the index is clean. So I'd like to
> propose a short patch to improve amcheck for checking the unique
> constraint. It will output tid's of tuples that are duplicated in the index
> (i.e. more than one tid for the same index key is visible) and the user can
> easily investigate and delete corresponding table entries.
>
> 0001 - is the actual patch, and
> 0002 - is a temporary hack for testing. It will allow inserting duplicates
> in a table even if an index with the exact name "idx" has a unique
> constraint (generally it is prohibited to insert). Then a new amcheck will
> tell us about these duplicates. It's pity but testing can not be done
> automatically, as it needs a core recompile. For testing I'd recommend a
> protocol similar to the following:
>
> - Apply patch 0002
> - Set autovaccum = off in postgresql.conf
>
>
>
> *create table tbl2 (a varchar(50), b varchar(150), c bytea, d
> varchar(50));create unique index idx on tbl2(a,b);insert into tbl2 select
> i::text::varchar, i::text::varchar, i::text::bytea, i::text::varchar from
> generate_series(0,3) as i, generate_series(0,10000) as x;*
>
> So we'll have a generous amount of duplicates in the table and index. Then:
>
> *create extension amcheck;*
> *select bt_index_check('idx', true);*
>
> There will be output about each pair of duplicates: index tid's, position
> in a posting list (if the index item is deduplicated) and table tid's.
>
> WARNING: index uniqueness is violated for index "idx": Index tid=(26,6)
> posting 218 and posting 219 (point to heap tid=(126,93) and tid=(126,94))
> page lsn=0/1B3D420.
> WARNING: index uniqueness is violated for index "idx": Index tid=(26,6)
> posting 219 and posting 220 (point to heap tid=(126,94) and tid=(126,95))
> page lsn=0/1B3D420.
> WARNING: index uniqueness is violated for index "idx": Index tid=(26,6)
> posting 220 and posting 221 (point to heap tid=(126,95) and tid=(126,96))
> page lsn=0/1B3D420.
> WARNING: index uniqueness is violated for index "idx": Index tid=(26,6)
> posting 221 and tid=(26,7) posting 0 (point to heap tid=(126,96) and
> tid=(126,97)) page lsn=0/1B3D420.
> WARNING: index uniqueness is violated for index "idx": Index tid=(26,7)
> posting 0 and posting 1 (point to heap tid=(126,97) and tid=(126,98)) page
> lsn=0/1B3D420.
> WARNING: index uniqueness is violated for index "idx": Index tid=(26,7)
> posting 1 and posting 2 (point to heap tid=(126,98) and tid=(126,99)) page
> lsn=0/1B3D420.
>
> Things to notice in the test output:
> - cross-page duplicates (when some of them on the one index page and the
> other are on the next). (Under patch 0002 they are marked by an additional
> message "*INFO: cross page equal keys"* to catch them among the other)
>
> - If we delete table entries corresponding to some duplicates in between
> the other duplicates (vacuum should be off), the message for this entry
> should disappear but the other duplicates should be detected as previous.
>
> *delete from tbl2 where ctid::text='(126,94)';*
> *select bt_index_check('idx', true);*
>
> WARNING: index uniqueness is violated for index "idx": Index tid=(26,6)
> posting 218 and posting 220 (point to heap tid=(126,93) and tid=(126,95))
> page lsn=0/1B3D420.
> WARNING: index uniqueness is violated for index "idx": Index tid=(26,6)
> posting 220 and posting 221 (point to heap tid=(126,95) and tid=(126,96))
> page lsn=0/1B3D420.
> WARNING: index uniqueness is violated for index "idx": Index tid=(26,6)
> posting 221 and tid=(26,7) posting 0 (point to heap tid=(126,96) and
> tid=(126,97)) page lsn=0/1B3D420.
> WARNING: index uniqueness is violated for index "idx": Index tid=(26,7)
> posting 0 and posting 1 (point to heap tid=(126,97) and tid=(126,98)) page
> lsn=0/1B3D420.
> WARNING: index uniqueness is violated for index "idx": Index tid=(26,7)
> posting 1 and posting 2 (point to heap tid=(126,98) and tid=(126,99)) page
> lsn=0/1B3D420.
>
> Caveat: if the first entry on the next index page has a key equal to the
> key on a previous page AND all heap tid's corresponding to this entry are
> invisible, currently cross-page check can not detect unique constraint
> violation between previous index page entry and 2nd, 3d and next current
> index page entries. In this case, there would be a message that recommends
> doing VACUUM to remove the invisible entries from the index and repeat the
> check. (Generally, it is recommended to do vacuum before the check, but for
> the testing purpose I'd recommend turning it off to check the detection of
> visible-invisible-visible duplicates scenarios)
>
> Your feedback is very much welcome, as usual.
>
> --
> Best regards,
> Pavel Borisov
>
> Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>
>

There was typo, I mean, initially"...index will NOT let the user know that
it is corrupted..."

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-02-08 21:46:50 Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
Previous Message Tom Lane 2021-02-08 19:29:41 Re: small test case for abbrev(cidr)