From: | Konstantin Knizhnik <knizhnik(at)garret(dot)ru> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Bug in amcheck? |
Date: | 2025-10-22 16:29:51 |
Message-ID: | 33e39552-6a2a-46f3-8b34-3f9f8004451f@garret.ru |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi hackers.
We see the following error reported by amcheck (I have added dump of
opaque) when it interleaves with autovacuum and cancel pt:
ERROR: mismatch between parent key and child high key in index
"pg_attribute_relid_attnam_index"
DETAIL: Target block=274, target opaque->flags=0, child block=427,
child opaque=11, target page lsn=1/484A8FC8.
CONTEXT: SQL statement "SELECT bt_index_parent_check(indexrelid, true,
true) from pg_index"
So child has BTP_HALF_DEAD bit set.
Autovacuum is interrupted in this place in _bt_pagedel:
/*
* Check here, as calling loops will have locks held, preventing
* interrupts from being processed.
*/
CHECK_FOR_INTERRUPTS();
Reproducing it is not so easy.
First of all I added sleep here:
/*
* Check here, as calling loops will have locks held, preventing
* interrupts from being processed.
*/
pg_usleep(10000);
CHECK_FOR_INTERRUPTS();
Then I create two procedures:
create or replace procedure create_tables(tables integer, partitions
integer) as $$
declare
i integer;
j integer;
begin
for i in 1..tables
loop
execute 'DROP TABLE IF EXISTS t_' || i;
execute 'CREATE TABLE t_' || i || '(pk integer) partition by
range (pk)';
for j in 1..partitions
loop
execute 'create table p_'||i||'_'||j||' partition of
t_'||i||' for values from ('||j||') to ('||(j + 1)||')';
end loop;
execute 'insert into t_'||i||' values
(generate_series(1,'||partitions||'))';
end loop;
end;
$$ language plpgsql;
and
create or replace procedure run_amcheck() as $$
begin
loop
if (select count(*) from pg_stat_activity where
backend_type='autovacuum worker') > 0
then
raise notice 'Run amcheck!';
perform bt_index_parent_check(indexrelid, true, true) from
pg_index;
end if;
perform pg_sleep(1);
end loop;
end;
$$ language plpgsql;
Then I run concurrently run_amcheck()
and the following script for pgbench:
call create_tables(2,1000);
select pg_sleep(2);
If the problem is not reproduced, then cancel run_amcheck() and restart
it once again.
Backtrace (pg16) is the following:
* frame #0: 0x00000001017b6aac
amcheck.dylib`bt_child_highkey_check(state=0x000000010c846318,
target_downlinkoffnum=37, loaded_child="\U00000001", target_level=1) at
verify_nbtree.c:2146:23
frame #1: 0x00000001017b7fd8
amcheck.dylib`bt_child_check(state=0x000000010c846318,
targetkey=0x000000013c01c448, downlinkoffnum=37) at verify_nbtree.c:2262:2
frame #2: 0x00000001017b5f4c
amcheck.dylib`bt_target_page_check(state=0x000000010c846318) at
verify_nbtree.c:1623:4
frame #3: 0x00000001017b3908
amcheck.dylib`bt_check_level_from_leftmost(state=0x000000010c846318,
level=(level = 1, leftmost = 3, istruerootlevel = false)) at
verify_nbtree.c:859:3
frame #4: 0x00000001017b24e8
amcheck.dylib`bt_check_every_level(rel=0x0000000140074f18,
heaprel=0x0000000130070148, heapkeyspace=true, readonly=true,
heapallindexed=true, rootdescend=true) at verify_nbtree.c:603:13
frame #5: 0x00000001017b198c
amcheck.dylib`bt_index_check_internal(indrelid=2674, parentcheck=true,
heapallindexed=true, rootdescend=true) at verify_nbtree.c:362:3
frame #6: 0x00000001017b1a78
amcheck.dylib`bt_index_parent_check(fcinfo=0x000000010c83b040) at
verify_nbtree.c:242:2
I wonder if we should add P_ISHALFDEAD(opaque) for child page?
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2025-10-22 16:33:20 | Re: abi-compliance-check failure due to recent changes to pg_{clear,restore}_{attribute,relation}_stats() |
Previous Message | Tom Lane | 2025-10-22 16:27:54 | Re: Dynamic shared memory areas |