Re: BUG #17245: Index corruption involving deduplicated entries

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Kamigishi Rei <iijima(dot)yun(at)koumakan(dot)jp>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: BUG #17245: Index corruption involving deduplicated entries
Date: 2021-11-08 00:28:46
Message-ID: 20211108002846.GL17618@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I was going to ask if it's possible to look for corrupted indexes somehow, but
already discovered two likely instances of this issue:

$ /usr/pgsql-14/bin/pg_amcheck --relation ts.child.alarms_null_alarm_clear_time_idx
btree index "ts.child.alarms_null_alarm_clear_time_idx":
ERROR: posting list contains misplaced TID in index "alarms_null_alarm_clear_time_idx"
DETAIL: Index tid=(26,10) posting list offset=88 page lsn=223E/92E3DA40.

The index in one case is now 448 kB, in the other case is 632 kB, so I suspect
they're due to this bug (for the 2nd one, a new index created from scratch with
the same definition is 144 kB).

I tried looking for duplicate TIDs like this (which I wrote to search every
suspect table before I found the above issue) - I'm not sure this is right,
though.

| SELECT format('SELECT htid, COUNT(1), %L, %L, %L FROM bt_page_items(%L||''.''||%L,%s) WHERE htid IS NOT NULL AND tids IS NOT NULL GROUP BY 1 HAVING COUNT(1)>1 LIMIT 1', relname, blk, COUNT(1)OVER(), relnamespace::regnamespace, relname, blk ) FROM pg_class c JOIN pg_am a ON c.relam=a.oid AND a.amname='btree' , generate_series(1,pg_relation_size(c.oid)/8192 - 1)blk WHERE c.relkind='i' AND relpages<99 AND (relnamespace=to_regnamespace('child') AND relname~'null') ;

But this doesn't find any issue in that index. If I'm not wrong, "duplicate
TIDs" are just one likely symptom of the bug, but not its only consequence.
I donno how the TIDs can get out of order, though?

I thought you might like to know about another instance of this in the wild.
I'll disable those indexes and leave them around for a bit in case there's any
question about them.

--
Justin

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2021-11-08 00:54:45 Re: BUG #17245: Index corruption involving deduplicated entries
Previous Message Noah Misch 2021-11-07 23:22:02 Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data