Re: BUG #17268: Possible corruption in toast index after reindex index concurrently

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
Cc: Alexey Ermakov <alexey(dot)ermakov(at)dataegret(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
Date: 2021-11-04 20:20:58
Message-ID: CAH2-WznMD0TOL5njf0udBZdDpagu+ZCLE9E-TpLXqezvPy6Z9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Nov 4, 2021 at 12:47 PM Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> wrote:
> now... and yes during the time of error page 59561917 was very close
> to the end of the table.
> There was a high chance (but not 100%) that the corresponding main
> table entry had been inserted during reindex CONCURRENTLY of the toast
> index run.

It might be useful if you located the leaf page that the missing index
tuple is supposed to be on. It's possible that there is a recognizable
pattern. If you knew the block number of the relevant leaf page in the
index already, then you could easily dump the relevant page to a small
file, and share it with us here. The usual procedure is described
here:

https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#contrib.2Fpageinspect_page_dump

The tricky part is figuring out which block number is the one of
interest. I can't think of any easy way of doing that in a production
database. The easiest approach I can think of is to use
pg_buffercache. Restart the database (or more likely an instance of
the database that has the problem, but isn't the live production
database). Then write a query like this:

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM pg_toast_2624976286 WHERE
chunk_id = 4040061139;

(The BUFFERS stuff is to verify that you got buffer hits.)

Next query the blocks that you see in pg_buffercache:

SELECT relblocknumber FROM pg_buffercache WHERE relfilenode =
pg_relation_filenode('pg_toast.pg_toast_2624976286_index');

Finally, note down the block numbers that the query returns. There
will probably be 3 or 4. Just send me any that are non-0 (that's just
the metapage). I only really care about the leaf page, but I can
figure that part out myself when I have the pages you access.

(It's a pity there isn't a less cumbersome procedure here.)

--
Peter Geoghegan

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2021-11-04 22:07:22 Re: BUG #17245: Index corruption involving deduplicated entries
Previous Message Maxim Boguk 2021-11-04 20:01:44 Re: BUG #17268: Possible corruption in toast index after reindex index concurrently