Re: Pg14 possible index corruption after reindex concurrently

From: Aleš Zelený <zeleny(dot)ales(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Pg14 possible index corruption after reindex concurrently
Date: 2022-05-25 16:17:37
Message-ID: CAODqTUab7sxVALWm=7N5mBxz9Q_FWKWLD6jUm6FFxU9hNvMcHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

you are right it looks to be related, on our running system with connected
users such an issue happens not only on primary keys but also on other
(non-unique) indexes.

I've checked all indexes using amcheck:

select * from bt_index_check(index =>
'prematch.opportunities.pk_tabodds_idodds'::regclass::oid, heapallindexed
=> true);

Failed one rebuild and check again to ensure all things are OK.

Still have a problem and haven't found any explanation in the manuals:

Run amcheck to validate a table and its indexes:

/usr/pgsql-14/bin/pg_amcheck -d prematch -r opportunities.tab_odds
--heapallindexed -j 16 -Pv ; echo $?
...
btree index "prematch.opportunities.pk_tabodds_idodds":
ERROR: heap tuple (2199116,5) from table "tab_odds" lacks matching
index tuple within index "pk_tabodds_idodds"
HINT: Retrying verification using the function bt_index_parent_check()
might provide a more specific error.
2

OK, rebuild the index (first give it a try to use the concurrently option):
prematch=# REINDEX INDEX CONCURRENTLY
prematch.opportunities.pk_tabodds_idodds;

REINDEX
prematch=#
prematch=# select * from bt_index_check(index =>
'prematch.opportunities.pk_tabodds_idodds'::regclass::oid, heapallindexed
=> true);

bt_index_check
----------------

(1 row)

Looks we were lucky.

Just to be sure, one more check from the command line:

/usr/pgsql-14/bin/pg_amcheck -d prematch -r opportunities.tab_odds
--heapallindexed -j 16 -P ; echo $?
0/15 relations (0%), 0/16807950 pages (0%)
heap table "prematch.opportunities.tab_odds", block 1649057, offset 47:
xmin 4062380236 precedes relation freeze threshold 2:3960858664
heap table "prematch.opportunities.tab_odds", block 1649057, offset 48:
xmax 4062380236 precedes relation freeze threshold 2:3960858664
...
xmax 4062380813 precedes relation freeze threshold 2:3960858664
heap table "prematch.opportunities.tab_odds", block 2210728, offset 8:
xmax 4062380814 precedes relation freeze threshold 2:3960858664
2

So I got non-zero exit code, but no errors in the log file:

-bash-4.2$ grep -i ERROR 20220525_1710_prematch.opportunities.tab_odds.log
-bash-4.2$

We have multiple DB servers upgraded to 14, only a few amchecks:

( time /usr/pgsql-14/bin/pg_amcheck -a --heapallindexed -j 16 -P ; echo $?
) |& tee -a `date +%Y%m%d_%H%M`_`hostname`_amcheck.log

finished with zero exit code, most of them returned 2, within the xmin/xmax
messages as shown above. Is that an issue (since missing entries in the
index cause ERRRO reported in the log, but after rebuilding the indexes it
was not the case anymore)?
If it is an issue, is there a way to fix it?

Thanks Ales

út 24. 5. 2022 v 21:58 odesílatel Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
napsal:

> On Wed, May 25, 2022 at 6:17 AM Aleš Zelený <zeleny(dot)ales(at)gmail(dot)com> wrote:
> > SELECT format('REINDEX SCHEMA CONCURRENTLY %I;', n.nspname)
>
> This may be related to bug #17485, discussed at:
>
>
> https://www.postgresql.org/message-id/flat/17485-396609c6925b982d%40postgresql.org
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2022-05-25 17:02:35 Re: Extension pg_trgm, permissions and pg_dump order
Previous Message Carsten Klein 2022-05-25 15:26:03 Re: Connect to specific cluster on command line