Re: amcheck verification for GiST and GIN

From: Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Grigory Kryachko <gskryachko(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: amcheck verification for GiST and GIN
Date: 2021-07-29 18:34:47
Message-ID: CANNMO++6zs6pgY8shQgAor0AqMSgQfeEtpauQRWOY7_nSFXPRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

First of all, thank you all -- Andrey, Peter, Heikki and others -- for this
work, GIN support in amcheck is *really* needed, especially for OS upgrades
such as from Ubuntu 16.04 (which is EOL now) to 18.04 or 20.04

I was trying to check a bunch of GINs on some production after switching
from Ubuntu 16.04 to 18.04 and got many errors. So decided to check for
16.04 first (that is still used on prod for that DB), without any OS/glibc
changes.

On 16.04, I still saw errors and it was not really expected because this
should mean that production is corrupted too. So, REINDEX should fix it.
But it didn't -- see output below. I cannot give data and thinking how to
create a synthetic demo of this. Any suggestions?

And is this a sign that the tool is wrong rather that we have a real
corruption cases? (I assume if we did, we would see no errors after
REINDEXing -- of course, if GIN itself doesn't have bugs).

Env: Ubuntu 16.04 (so, glibc 2.27), Postgres 12.7, patch from Heikki
slightly adjusted to work with PG12 (
https://gitlab.com/postgres/postgres/-/merge_requests/5) snippet used to
run amcheck:
https://gitlab.com/-/snippets/2001962 (see file #3)

Before reindex:

INFO: [2021-07-29 17:44:42.525+00] Processing 4/29: index:
index_XXX_trigram (index relpages: 117935; heap tuples: ~379793)...

ERROR: index "index_XXX_trigram" has wrong tuple order, block 65754, offset
232

test=# reindex index index_XXX_trigram;

REINDEX

After REINDEX:

INFO: [2021-07-29 18:01:23.339+00] Processing 4/29: index:
index_XXX_trigram (index relpages: 70100; heap tuples: ~379793)...

ERROR: index "index_XXX_trigram" has wrong tuple order, block 70048, offset
253

On Thu, Jul 15, 2021 at 00:03 Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> On 07/08/2020 00:33, Peter Geoghegan wrote:
> > On Wed, May 27, 2020 at 10:11 AM Grigory Kryachko <gskryachko(at)gmail(dot)com>
> wrote:
> >> Here is the patch which I (with Andrey as my advisor) built on the top
> of the last patch from this thread:
> https://commitfest.postgresql.org/25/1800/ .
> >> It adds an ability to verify validity of GIN index. It is not polished
> yet, but it works and we wanted to show it to you so you can give us some
> feedback, and also let you know about this work if you have any plans of
> writing something like that yourselves, so that you do not redo what is
> already done.
> >
> > Can you rebase this patch, please?
> >
> > Also suggest breaking out the series into distinct patch files using
> > "git format-patch master".
>
> I rebased the GIN parts of this patch, see attached. I also ran pgindent
> and made some other tiny cosmetic fixes, but I didn't review the patch,
> only rebased it in the state it was.
>
> I was hoping that this would be useful to track down the bug we're
> discussing here:
>
> https://www.postgresql.org/message-id/CAJYBUS8aBQQL22oHsAwjHdwYfdB_NMzt7-sZxhxiOdEdn7cOkw%40mail.gmail.com.
>
> But now that I look what checks this performs, I doubt this will catch
> the kind of corruption that's happened there. I suspect it's more subtle
> than an inconsistencies between parent and child pages, because only a
> few rows are affected. But doesn't hurt to try.
>
> - Heikki
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2021-07-29 18:35:01 Re: pg_upgrade does not upgrade pg_stat_statements properly
Previous Message Bruce Momjian 2021-07-29 18:28:55 Re: pg_upgrade does not upgrade pg_stat_statements properly