Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

From: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To: Robins Tharakan <tharakan(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Peter Geoghegan <pg(at)bowt(dot)ie>, James Coleman <jtc331(at)gmail(dot)com>
Subject: Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
Date: 2024-03-21 06:16:42
Message-ID: 67EADE8F-AEA6-4B73-8E38-A69E5D48BAFE@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 29 Jun 2022, at 17:43, Robins Tharakan <tharakan(at)gmail(dot)com> wrote:

Sorry to bump ancient thread, I have some observations that might or might not be relevant.
Recently we noticed a corruption on one of clusters. The corruption at hand is not in system catalog, but in user indexes.
The cluster was correctly configured: checksums, fsync, FPI etc.
The cluster never was restored from a backup. It’s a single-node cluster, so it was not ever promoted, pg_rewind-ed etc. VM had never been rebooted.

But, the cluster had been experiencing 10 OOMs a day. There were no torn pages, no checsum erros at log at all. Yet, B-tree indexes became corrupted.

Sorry for this wall of text, I’m posing everything as-is in case if there is some useful information.

$ /etc/cron.yandex/pg_corruption_check.py --index
2024-03-01 11:54:05,075 ERROR : Corrupted index: 96009 table1_table1message_table1_team_identity_06a95642 XX002 ERROR: posting list contains misplaced TID in index "table1_table1message_table1_team_identity_06a95642" DETAIL: Index tid=(267,34) posting list offset=137 page lsn=31B/62159608.
2024-03-01 11:54:05,100 ERROR : Corrupted index: 96008 table1_table1message_organization_id_66c18ed2 XX002 ERROR: posting list contains misplaced TID in index "table1_table1message_organization_id_66c18ed2" DETAIL: Index tid=(267,34) posting list offset=137 page lsn=31B/62158BC8.
2024-03-01 11:54:05,355 ERROR : Corrupted index: 95804 table2_aler_channel_81aeec_idx XX002 ERROR: posting list contains misplaced TID in index "table2_aler_channel_81aeec_idx" DETAIL: Index tid=(336,7) posting list offset=182 page lsn=314/9B794248.
2024-03-01 11:54:05,716 ERROR : Corrupted index: 95816 table2_table3_channel_id_91a1912f XX002 ERROR: posting list contains misplaced TID in index "table2_table3_channel_id_91a1912f" DETAIL: Index tid=(384,2) posting list offset=72 page lsn=317/3F14F390.
2024-03-01 11:54:06,068 ERROR : Corrupted index: 95815 table2_table3_channel_filter_id_6706c8b6 XX002 ERROR: posting list contains misplaced TID in index "table2_table3_channel_filter_id_6706c8b6" DETAIL: Index tid=(380,2) posting list offset=72 page lsn=317/3F0D8E30.
2024-03-01 11:54:06,302 ERROR : Corrupted index: 95824 table2_table3_root_alert_group_id_f327f122 XX002 ERROR: item order invariant violated for index "table2_table3_root_alert_group_id_f327f122" DETAIL: Lower index tid=(368,204) (points to heap tid=(48901,2)) higher index tid=(368,205) (points to heap tid=(48901,2)) page lsn=319/3C234588.
2024-03-01 11:54:06,538 ERROR : Corrupted index: 95810 table2_table3_acknowledged_by_user_id_dd6723dc XX002 ERROR: posting list contains misplaced TID in index "table2_table3_acknowledged_by_user_id_dd6723dc" DETAIL: Index tid=(380,69) posting list offset=35 page lsn=317/C14E2D50.
2024-03-01 11:54:06,775 ERROR : Corrupted index: 95825 table2_table3_silenced_by_user_id_40a833a1 XX002 ERROR: posting list contains misplaced TID in index "table2_table3_silenced_by_user_id_40a833a1" DETAIL: Index tid=(371,11) posting list offset=144 page lsn=318/61171918.
2024-03-01 11:54:07,009 ERROR : Corrupted index: 95829 table2_table3_wiped_by_id_4326ff61 XX002 ERROR: item order invariant violated for index "table2_table3_wiped_by_id_4326ff61" DETAIL: Lower index tid=(373,97) (points to heap tid=(48901,2)) higher index tid=(373,98) (points to heap tid=(48901,2)) page lsn=318/61172788.
2024-03-01 11:54:07,245 ERROR : Corrupted index: 95823 table2_table3_resolved_by_user_id_463cdf3d XX002 ERROR: posting list contains misplaced TID in index "table2_table3_resolved_by_user_id_463cdf3d" DETAIL: Index tid=(375,89) posting list offset=144 page lsn=319/3C1DCFC8.
2024-03-01 11:54:07,479 ERROR : Corrupted index: 95819 table2_table3_maintenance_uuid_9a7b8529_like XX002 ERROR: item order invariant violated for index "table2_table3_maintenance_uuid_9a7b8529_like" DETAIL: Lower index tid=(372,4) (points to heap tid=(48901,2)) higher index tid=(372,5) (points to heap tid=(48901,2)) page lsn=317/C1A210A8.
2024-03-01 11:54:07,717 ERROR : Corrupted index: 95827 table2_table3_table1_message_id_58a31784_like XX002 ERROR: posting list contains misplaced TID in index "table2_table3_table1_message_id_58a31784_like" DETAIL: Index tid=(373,89) posting list offset=144 page lsn=319/3C3EE660.
2024-03-01 11:54:08,162 ERROR : Corrupted index: 96066 webhooks_webhookresponse_webhook_id_db49ebcd XX002 ERROR: item order invariant violated for index "webhooks_webhookresponse_webhook_id_db49ebcd" DETAIL: Lower index tid=(522,24) (points to heap tid=(73981,1)) higher index tid=(522,25) (points to heap tid=(73981,1)) page lsn=31B/E522B640.
2024-03-01 11:54:08,646 ERROR : Corrupted index: 95822 table2_table3_resolved_by_alert_id_bbdf0a83 XX002 ERROR: posting list contains misplaced TID in index "table2_table3_resolved_by_alert_id_bbdf0a83" DETAIL: Index tid=(618,2) posting list offset=150 page lsn=317/C1DE74B8.
2024-03-01 11:54:08,873 ERROR : Corrupted index: 95427 table2_table3_table1_message_id_key XX002 ERROR: item order invariant violated for index "table2_table3_table1_message_id_key" DETAIL: Lower index tid=(369,134) (points to heap tid=(48901,2)) higher index tid=(369,135) (points to heap tid=(48901,2)) page lsn=319/3B629E58.
2024-03-01 11:54:09,108 ERROR : Corrupted index: 95417 table2_table3_maintenance_uuid_key XX002 ERROR: posting list contains misplaced TID in index "table2_table3_maintenance_uuid_key" DETAIL: Index tid=(371,42) posting list offset=47 page lsn=318/6116FC50.
2024-03-01 11:54:10,180 ERROR : Corrupted index: 95826 table2_table3_table1_log_message_id_587aaa8d_like XX002 ERROR: posting list contains misplaced TID in index "table2_table3_table1_log_message_id_587aaa8d_like" DETAIL: Index tid=(849,19) posting list offset=79 page lsn=319/3C389B60.
2024-03-01 11:54:10,689 ERROR : Corrupted index: 95820 table2_table3_mattermost_log_message_id_69bc2ae4_like XX002 ERROR: item order invariant violated for index "table2_table3_mattermost_log_message_id_69bc2ae4_like" DETAIL: Lower index tid=(559,4) (points to heap tid=(48901,2)) higher index tid=(559,5) (points to heap tid=(48901,2)) page lsn=317/C1A7BA50.
2024-03-01 11:54:11,760 ERROR : Corrupted index: 95425 table2_table3_table1_log_message_id_key XX002 ERROR: item order invariant violated for index "table2_table3_table1_log_message_id_key" DETAIL: Lower index tid=(849,22) (points to heap tid=(48901,2)) higher index tid=(849,23) (points to heap tid=(48901,2)) page lsn=317/3E7EC1F0.
2024-03-01 11:54:12,282 ERROR : Corrupted index: 95419 table2_table3_mattermost_log_message_id_key XX002 ERROR: posting list contains misplaced TID in index "table2_table3_mattermost_log_message_id_key" DETAIL: Index tid=(566,84) posting list offset=65 page lsn=319/3B1901F8.
2024-03-01 11:54:17,990 ERROR : Corrupted index: 95423 table2_table3_public_primary_key_key XX002 ERROR: cross page item order invariant violated for index "table2_table3_public_primary_key_key" DETAIL: Last item on page tid=(727,146) page lsn=31B/E104D660.

Most of these messages look similar, except last one: “cross page item order invariant violated for index”. Indeed, index scans were hanging in a cycle.
I could not locate problem in WAL yet, because a lot of other stuff is going on. But I have no other ideas, but suspect that posting list redo is corrupting index in case of a crash.

Thanks!

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-03-21 06:23:32 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Amit Kapila 2024-03-21 06:13:54 Re: Introduce XID age and inactive timeout based replication slot invalidation