Re: [HACKERS] A design for amcheck heapam verification

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] A design for amcheck heapam verification
Date: 2018-01-11 10:14:06
Message-ID: 049AE496-791B-4C0E-8ACB-43832F9FA2B8@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello!

I like heapam verification functionality and use it right now. So, I'm planning to provide review for this patch, probably, this week.

From my current use I have some thoughts on interface. Here's what I get.

# select bt_index_check('messagefiltervalue_group_id_59490523e6ee451f',true);
ERROR: XX001: heap tuple (45,21) from table "messagefiltervalue" lacks matching index tuple within index "messagefiltervalue_group_id_59490523e6ee451f"
HINT: Retrying verification using the function bt_index_parent_check() might provide a more specific error.
LOCATION: bt_tuple_present_callback, verify_nbtree.c:1316
Time: 45.668 ms

# select bt_index_check('messagefiltervalue_group_id_59490523e6ee451f');
bt_index_check
----------------

(1 row)
Time: 32.873 ms

# select bt_index_parent_check('messagefiltervalue_group_id_59490523e6ee451f');
ERROR: XX002: down-link lower bound invariant violated for index "messagefiltervalue_group_id_59490523e6ee451f"
DETAIL: Parent block=6259 child index tid=(1747,2) parent page lsn=4A0/728F5DA8.
LOCATION: bt_downlink_check, verify_nbtree.c:1188
Time: 391194.113 ms

Seems like new check is working 4 orders of magnitudes faster then bt_index_parent_check() and still finds my specific error that bt_index_check() missed.
From this output I see that there is corruption, but cannot understand:
1. What is the scale of corruption
2. Are these corruptions related or not

I think an interface to list all or top N error could be useful.

> 14 дек. 2017 г., в 0:02, Peter Geoghegan <pg(at)bowt(dot)ie> написал(а):
>>
>> This could also test the reproducibility of the tests with a fixed
>> seed number and at least two rounds, a low number of elements could be
>> more appropriate to limit the run time.
>
> The runtime is already dominated by pg_regress overhead. As it says in
> the README, using a fixed seed in the test harness is pointless,
> because it won't behave in a fixed way across platforms. As long as we
> cannot ensure deterministic behavior, we may as well fully embrace
> non-determinism.
I think that determinism across platforms is not that important as determinism across runs.

Thanks for the amcheck! It is very useful.

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuro Yamada 2018-01-11 10:14:33 Minor code improvement to estimate_path_cost_size in postgres_fdw
Previous Message Masahiko Sawada 2018-01-11 10:10:50 Re: [HACKERS] Creating backup history files for backups taken from standbys