| From: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com> |
|---|---|
| To: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
| Cc: | Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Wenbo Lin <linwenbo1994(at)gmail(dot)com> |
| Subject: | Re: amcheck: add index-all-keys-match verification for B-Tree |
| Date: | 2026-04-28 11:01:51 |
| Message-ID: | CAKZiRmzigRhZCunAGAu8iYyWkUdzkn0PdSyC_Ur_pMwHjCer=Q@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Andrey,
On Wed, Apr 8, 2026 at 3:17 PM Jakub Wartak
<jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
>
> On Thu, Mar 5, 2026 at 7:00 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> > Thanks for the review! I've addressed all these issues in patch steps 4 and 5. Other steps are intact. PFA.
>
> Hi Andrey and Wenboo,
>
> 1. I tried the v4 to see how it would cope with heap missing segments[0]
> (with -Dsegsize_blocks=6), so given:
[..]
> postgres=# select bt_index_check(index=>'ptest_pkey',
> heapallindexed=>true, checkunique=>true, indexallkeysmatch=>true);
> ERROR: could not open file "base/5/24578.1" (target block 6): No such
> file or directory
>
> this error was not quite expected for me (altough it is much better than nothing
> as on master today), because patch v4-0002 assumes in
> bt_verify_index_tuple_points_to_heap() that we should get false and error in
> more human-firendly way ("index tuple points to non-existent heap"). The error
> itself is coming out of the following stacktrace:
[..]
> So with the attached patch it becomes more human understandable, but it is
> somehow orthogonal check? Still maybe we can combine this under this $thread
> ambrella and make it as an option of indexallkeysmatch too? Attached
> behaves likes this:
>
> postgres=# select bt_index_check(index=>'ptest_pkey');
> ERROR: index line pointer in index "ptest_pkey" points to missing
> page in table "ptest"
> DETAIL: Index tid=(5,14) points to heap tid=(6,1) but heap has only 6 blocks.
> HINT: this can be caused by lost relation segment (missing or removed file).
>
> (note it does not need to indexallkeysmatch, but it's related)
>
[..]
> 2. For some reason email from Wenbo didn't get to my emailbox, but I can
> see his email here:
> https://www.postgresql.org/message-id/resend/ba92ac77-24f1-44ad-abf0-11e64e0a7831%40gmail.com
> (and resending is is somewhat not effective)
>
> > I've been reviewing the patch and ran some concurrent tests against it.
> > I found an issue with false positive corruption reports under concurrent
> > VACUUM.
> >
> > + slot = table_slot_create(state->heaprel, NULL);
> > + found = table_tuple_fetch_row_version(state->heaprel, tid,
> > + SnapshotAny, slot);
> > + if (!found)
> > + {
> > + ExecDropSingleTupleTableSlot(slot);
> > + ereport(ERROR,
> >
> >
> > bt_verify_index_tuple_points_to_heap uses SnapshotAny with
> > table_tuple_fetch_row_version to distinguish "tuple doesn't exist" from
> > "tuple exists but is dead". However, bt_index_check only holds
> > AccessShareLock which compatible with VACUUM's ShareUpdateExclusiveLock.
> > A concurrent VACUUM Phase 1 can prune heap pages(ItemIdSetDead) between
> > Bloom filter probe and the heap lookup. Causing
> > table_tuple_fetch_row_version to return false for a legal
> > dead-but-now-pruned tuple and a false positive corruption report.
> >
> > The table_tuple_satisfies_snapshot check does not help, it only runs
> > when the fetch succeeds(LP_NORMAL). Once VACUUM sets LP_DEAD,
> > heapam_fetch sees !ItemIdIsNormal(lp) and returns false before any
> > snapshot checks.
> >
> > The reproduce should be: DELETE all rows form a big table, then launch
> > VACUUM and bt_index_check concurrently. A POC test script attached.
>
> I think this is spot-on. I haven't run repro, but if we take ASL and VACUUM
> is allowed to run (under ShareUpdateExclusiveLock) then amcheck performs the
> heap bloom building (but ignores dead tuples, while btree still can
> point to those) (..)
[..]
> [0] - https://www.postgresql.org/message-id/flat/013D63E2-5D75-492E-85FF-1D5CC0148C82%40gmail.com
Here's my attempt at the fixing the LP_DEAD tuples being reported as false
positives by the v4 patchset as stated earlier. The simplest way for the
v4 to hit those was to:
1) get a fresh initdb cluster
2) run it on all pg_catalog tables/indexes, and to me it (v4*) always reliably
spotted some LP_DEAD ones there, e.g.:
ERROR: index tuple in index "pg_proc_proname_args_nsp_index" points
to non-existent heap tuple in table "pg_proc"
DETAIL: Index tid=(5,21) points to heap tid=(18,32) that no longer exists.
postgres=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax,
t_ctid, t_infomask, t_infomask2 FROM
heap_page_items(get_raw_page('pg_catalog.pg_proc', 18)) WHERE lp =
32;
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_ctid |
t_infomask | t_infomask2
----+--------+----------+--------+--------+--------+--------+------------+-------------
32 | 0 | 3 | 0 | | | | |
where lp_flags=3 ==> LP_DEAD
When we do bt_index_check() still with AccessShareLock we may expect:
- concurrent VACUUM producing LP_DEADs
- SELECT doing opportunistic heap pruning (micro-vacuuming) and that is
allowed to run even with ShareUpdate (bt_index_parent_check)
- LP_DEAD being already there before we even start (aborted VACUUM?)
I've tried to enhance v4 based on above assumptions, so v5 is attached.
Note: AFAIU opportunistic pruning can produce LP_UNUSED only on HOT-chain
intermediate tuples, and those should be never referenced by a btree index.
I think it's safe to still to stick to AccessShareLock (earlier I've
mentioned ShareUpdateExclusiveLock, but it looks it is doable without that)
as we are not affected by truncation and we have xmin horizon so stuff should
not be removed from underneath us.
Note that the pg_amcheck/003_check (not contrib/amcheck) still blows-up:
not ok 46 - pg_amcheck over schema s2 with corrupt tables excluded
reports no corruption status (got 2 vs expected 0)
not ok 47 - pg_amcheck over schema s2 with corrupt tables excluded
reports no corruption stdout /(?^:^$)/
it's due to SELECT "amcheck_schema".bt_index_check(index := c.oid,
heapallindexed := false )
ERROR: could not open file "base/16384/16508": No such file or directory
being called on index, while the *heap* relation has been intentionally
removed by plan_to_remove_relation_file() from that test. So even if the
tables are exluded by the pg_amcheck --exclude-table , amcheck with
v5-0005 blows up on RelationGetNumberOfBlocks() as the file is gone.
IMHO, we should remove that tes there, but I have not done so far this
far as I simply do not know if that's acceptable.
Moving that check into heapallindexed option is also a possibilty, but
that would
hide cheap check under big and costly option (reading whole heap).
Also, some rough measurements: One can expect indexallkeysmatched to multiply
bt_index_check runtime by roughly ~5x compared to a bare check (all flags off)
run, or - as expected - ~2x when compared to heapallindexed (that makes sense,
we are building another full bloom filter). Sample performance baseline of
corruption checking seems to be like that:
- ~5s for COPY pgbench_accounts to /dev/null (wont discover missing stuff)
- ~2s for COPY (SELECT aid FROM pgbench_accounts) to /dev/null (IOT,
wont discover
missing stuff)
amcheck:
- ~1.5s for bt_index_check('pgbench_accounts_pkey' (with v5b-0005 might discover
loss of heap relation segments)
- ~2s for bt_index_check() with just checkunique=on
- ~7.1s for bt_index_check() with just heapallindexed=on
- ~8.1s for bt_index_check() with just indexallkeysmatched=on (this new feature,
discovers dangling/missing/corrupted entries as Andrey explain in
original post)
- ~14s for full verification (everything on)
-J.
| Attachment | Content-Type | Size |
|---|---|---|
| v5b-0002-amcheck-report-corruption-when-index-points-to-n.patch | text/x-patch | 2.7 KB |
| v5b-0005-amcheck-attempt-to-detect-lost-heap-segments-bas.patch | text/x-patch | 8.5 KB |
| v5b-0004-amcheck-address-review-fix-Bloom-filter-comment-.patch | text/x-patch | 2.8 KB |
| v5b-0001-amcheck-add-indexallkeysmatch-verification-for-B.patch | text/x-patch | 28.1 KB |
| v5b-0003-Document-indexallkeysmatch-parameter-for-amcheck.patch | text/x-patch | 5.8 KB |
| v5b-0006-amcheck-report-corruption-when-index-points-to-n.patch | text/x-patch | 4.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Dilip Kumar | 2026-04-28 11:03:34 | Include schema-qualified names in publication error messages. |
| Previous Message | Amit Kapila | 2026-04-28 10:58:29 | Re: [Patch]: Fix excessive ProcArrayLock acquisitions with subscription max_retention_duration=0 |