Re: BUG #17386: btree index corruption after reindex concurrently on write heavy table

From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: "maxim(dot)boguk" <maxim(dot)boguk(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17386: btree index corruption after reindex concurrently on write heavy table
Date: 2022-01-28 14:17:28
Message-ID: CAK-MWwQ7fboHz_s8aTn08i0W2aNH6QoELeWwmH-y80tOfsdDBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Jan 28, 2022 at 3:42 PM PG Bug reporting form <
noreply(at)postgresql(dot)org> wrote:

> The following bug has been logged on the website:
>
> Bug reference: 17386
> Logged by: Maxim Boguk
> Email address: maxim(dot)boguk(at)gmail(dot)com
> PostgreSQL version: 14.1
> Operating system: Ubuntu Linux
> Description:
>
> Hi,
>
> I found multiple cases of Btree index corruption after REINDEX someindex
> CONCURRENTLY;.
>
> What I found so far:
> 1)corruption exists in master, on all replicas and test server recovered
> from base backup + wal archive (so it isn't some local hardware error)
> 2)it doesn't happen every time but at least two cases of corrupted indexes
> found after database verification with amcheck.
> 3)it happen on huge and write heavy table
>

Attempt fix situation with REINDEX INDEX jobs_pkey CONCURRENTLY;
lead to corrupted index again in different tuple:
ERROR: heap tuple (69306318,15) from table "jobs" lacks matching index
tuple within index "jobs_pkey"

SELECT * FROM heap_page_item_attrs(get_raw_page('jobs', 69306318),
'jobs'::regclass) where lp=15;
-[ RECORD 1 ]---
lp | 15
lp_off | 24
lp_flags | 2
lp_len | 0

select ctid,id,updated_at,created_at from jobs where
ctid='(69306318,24)'::tid;
-[ RECORD 1 ]--------------------------
ctid | (69306318,24)
id | 26192320674
updated_at | 2022-01-28 13:08:27.224275
created_at | 2021-11-17 14:35:59.441979

And again updated_at happens during the REINDEX run.

select ctid,id,updated_at,created_at from jobs where id=26192320674;
(0 rows)

SELECT t_ctid, raw_flags, combined_flags FROM
heap_page_item_attrs(get_raw_page('jobs', 69306318), 'jobs'::regclass),
LATERAL heap_tuple_infomask_flags(t_infomask, t_infomask2) where lp=24;
-[ RECORD 1
]--+---------------------------------------------------------------------------------------------------------------------------------------------
t_ctid | (69306318,24)
raw_flags |
{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_KEYSHR_LOCK,HEAP_XMAX_LOCK_ONLY,HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE}
combined_flags | {}

Seems something broken in combination of HOT update and REINDEX
CONCURRENTLY.

--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно
когда я так делаю ещё раз?"

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Dmitry Koval 2022-01-28 15:26:08 Re: BUG #17385: "RESET transaction_isolation" inside serializable transaction causes Assert at the transaction end
Previous Message 两个孩子的爹 2022-01-28 13:42:47 Re: BUG #17382: When vacuum full or vacuumdb - F is executed, a large number of empty files will be generated in the