Re: Amcheck verification of GiST and GIN

From: Jose Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>
To: Andrew Borodin <amborodin86(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Amcheck verification of GiST and GIN
Date: 2022-11-25 02:04:37
Message-ID: 45772f21-f9ac-535a-5050-ff89c389cd5@benetasso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

I reviewed this patch and I would like to share some comments.

It compiled with those 2 warnings:

verify_gin.c: In function 'gin_check_parent_keys_consistency':
verify_gin.c:481:38: warning: declaration of 'maxoff' shadows a previous
local [-Wshadow=compatible-local]
481 | OffsetNumber maxoff =
PageGetMaxOffsetNumber(page);
| ^~~~~~
verify_gin.c:453:41: note: shadowed declaration is here
453 | maxoff;
| ^~~~~~
verify_gin.c:423:25: warning: unused variable 'heapallindexed'
[-Wunused-variable]
423 | bool heapallindexed = *((bool*)callback_state);
| ^~~~~~~~~~~~~~

Also, I'm not sure about postgres' headers conventions, inside amcheck.h,
there is "miscadmin.h" included, and inside verify_gin.c, verify_gist.h
and verify_nbtree.c both amcheck.h and miscadmin.h are included.

About the documentation, the bt_index_parent_check has comments about the
ShareLock and "SET client_min_messages = DEBUG1;", and both
gist_index_parent_check and gin_index_parent_check lack it. verify_gin
uses DEBUG3, I'm not sure if it is on purpose, but it would be nice to
document it or put DEBUG1 to be consistent.

I lack enough context to do a deep review on the code, so in this area
this patch needs more eyes.

I did the following test:

postgres=# create table teste (t text, tv tsvector);
CREATE TABLE
postgres=# insert into teste values ('hello', 'hello'::tsvector);
INSERT 0 1
postgres=# create index teste_tv on teste using gist(tv);
CREATE INDEX
postgres=# select pg_relation_filepath('teste_tv');
pg_relation_filepath
----------------------
base/5/16441
(1 row)

postgres=#
\q
$ bin/pg_ctl -D data -l log
waiting for server to shut down.... done
server stopped
$ okteta base/5/16441 # I couldn't figure out the dd syntax to change the
1FE9 to '0'
$ bin/pg_ctl -D data -l log
waiting for server to start.... done
server started
$ bin/psql -U ze postgres
psql (16devel)
Type "help" for help.

postgres=# SET client_min_messages = DEBUG3;
SET
postgres=# select gist_index_parent_check('teste_tv'::regclass, true);
DEBUG: verifying that tuples from index "teste_tv" are present in "teste"
ERROR: heap tuple (0,1) from table "teste" lacks matching index tuple
within index "teste_tv"
postgres=#

A simple index corruption in gin:

postgres=# CREATE TABLE "gin_check"("Column1" int[]);
CREATE TABLE
postgres=# insert into gin_check values (ARRAY[1]),(ARRAY[2]);
INSERT 0 2
postgres=# CREATE INDEX gin_check_idx on "gin_check" USING GIN("Column1");
CREATE INDEX
postgres=# select pg_relation_filepath('gin_check_idx');
pg_relation_filepath
----------------------
base/5/16453
(1 row)

postgres=#
\q
$ bin/pg_ctl -D data -l logfile stop
waiting for server to shut down.... done
server stopped
$ okteta data/base/5/16453 # edited some bits near 3FCC
$ bin/pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
$ bin/psql -U ze postgres
psql (16devel)
Type "help" for help.

postgres=# SET client_min_messages = DEBUG3;
SET
postgres=# SELECT gin_index_parent_check('gin_check_idx', true);
ERROR: number of items mismatch in GIN entry tuple, 49 in tuple header, 1
decoded
postgres=#

There are more code paths to follow to check the entire code, and I had a
hard time to corrupt the indices. Is there any automated code to corrupt
index to test such code?

--
Jose Arthur Benetasso Villanova

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2022-11-25 02:25:30 Re: Allow file inclusion in pg_hba and pg_ident files
Previous Message David Rowley 2022-11-24 23:34:29 Re: Bug in row_number() optimization