[PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

From: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
To: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
Date: 2021-02-08 10:46:18
Message-ID: CALT9ZEHRn5xAM5boga0qnrCmPV52bScEK2QnQ1HmUZDD301JEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, hackers!

It seems that if btree index with a unique constraint is corrupted by
duplicates, amcheck now can not catch this. Reindex becomes impossible as
it throws an error but otherwise the index will let the user know that it
is corrupted, and amcheck will tell that the index is clean. So I'd like to
propose a short patch to improve amcheck for checking the unique
constraint. It will output tid's of tuples that are duplicated in the index
(i.e. more than one tid for the same index key is visible) and the user can
easily investigate and delete corresponding table entries.

0001 - is the actual patch, and
0002 - is a temporary hack for testing. It will allow inserting duplicates
in a table even if an index with the exact name "idx" has a unique
constraint (generally it is prohibited to insert). Then a new amcheck will
tell us about these duplicates. It's pity but testing can not be done
automatically, as it needs a core recompile. For testing I'd recommend a
protocol similar to the following:

- Apply patch 0002
- Set autovaccum = off in postgresql.conf

*create table tbl2 (a varchar(50), b varchar(150), c bytea, d
varchar(50));create unique index idx on tbl2(a,b);insert into tbl2 select
i::text::varchar, i::text::varchar, i::text::bytea, i::text::varchar from
generate_series(0,3) as i, generate_series(0,10000) as x;*

So we'll have a generous amount of duplicates in the table and index. Then:

*create extension amcheck;*
*select bt_index_check('idx', true);*

There will be output about each pair of duplicates: index tid's, position
in a posting list (if the index item is deduplicated) and table tid's.

WARNING: index uniqueness is violated for index "idx": Index tid=(26,6)
posting 218 and posting 219 (point to heap tid=(126,93) and tid=(126,94))
page lsn=0/1B3D420.
WARNING: index uniqueness is violated for index "idx": Index tid=(26,6)
posting 219 and posting 220 (point to heap tid=(126,94) and tid=(126,95))
page lsn=0/1B3D420.
WARNING: index uniqueness is violated for index "idx": Index tid=(26,6)
posting 220 and posting 221 (point to heap tid=(126,95) and tid=(126,96))
page lsn=0/1B3D420.
WARNING: index uniqueness is violated for index "idx": Index tid=(26,6)
posting 221 and tid=(26,7) posting 0 (point to heap tid=(126,96) and
tid=(126,97)) page lsn=0/1B3D420.
WARNING: index uniqueness is violated for index "idx": Index tid=(26,7)
posting 0 and posting 1 (point to heap tid=(126,97) and tid=(126,98)) page
lsn=0/1B3D420.
WARNING: index uniqueness is violated for index "idx": Index tid=(26,7)
posting 1 and posting 2 (point to heap tid=(126,98) and tid=(126,99)) page
lsn=0/1B3D420.

Things to notice in the test output:
- cross-page duplicates (when some of them on the one index page and the
other are on the next). (Under patch 0002 they are marked by an additional
message "*INFO: cross page equal keys"* to catch them among the other)

- If we delete table entries corresponding to some duplicates in between
the other duplicates (vacuum should be off), the message for this entry
should disappear but the other duplicates should be detected as previous.

*delete from tbl2 where ctid::text='(126,94)';*
*select bt_index_check('idx', true);*

WARNING: index uniqueness is violated for index "idx": Index tid=(26,6)
posting 218 and posting 220 (point to heap tid=(126,93) and tid=(126,95))
page lsn=0/1B3D420.
WARNING: index uniqueness is violated for index "idx": Index tid=(26,6)
posting 220 and posting 221 (point to heap tid=(126,95) and tid=(126,96))
page lsn=0/1B3D420.
WARNING: index uniqueness is violated for index "idx": Index tid=(26,6)
posting 221 and tid=(26,7) posting 0 (point to heap tid=(126,96) and
tid=(126,97)) page lsn=0/1B3D420.
WARNING: index uniqueness is violated for index "idx": Index tid=(26,7)
posting 0 and posting 1 (point to heap tid=(126,97) and tid=(126,98)) page
lsn=0/1B3D420.
WARNING: index uniqueness is violated for index "idx": Index tid=(26,7)
posting 1 and posting 2 (point to heap tid=(126,98) and tid=(126,99)) page
lsn=0/1B3D420.

Caveat: if the first entry on the next index page has a key equal to the
key on a previous page AND all heap tid's corresponding to this entry are
invisible, currently cross-page check can not detect unique constraint
violation between previous index page entry and 2nd, 3d and next current
index page entries. In this case, there would be a message that recommends
doing VACUUM to remove the invisible entries from the index and repeat the
check. (Generally, it is recommended to do vacuum before the check, but for
the testing purpose I'd recommend turning it off to check the detection of
visible-invisible-visible duplicates scenarios)

Your feedback is very much welcome, as usual.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>

Attachment Content-Type Size
v1-0002-Ignore-unique-constraint-check-on-btree-insert.-P.patch application/octet-stream 2.7 KB
v1-0001-Make-amcheck-checking-UNIQUE-constraint-for-btree.patch application/octet-stream 13.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-02-08 10:59:25 Re: Single transaction in the tablesync worker?
Previous Message Heikki Linnakangas 2021-02-08 10:17:11 Re: [POC] verifying UTF-8 using SIMD instructions