Re: Index corruption with CREATE INDEX CONCURRENTLY

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, Keith Fiske <keith(at)omniti(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index corruption with CREATE INDEX CONCURRENTLY
Date: 2017-02-19 10:22:54
Message-ID: CABOikdPVi=DxSmaVJz8j+gBMpQ9A+6ka_DgbRYnNnAwaFt01xw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 19, 2017 at 3:43 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Fri, Feb 17, 2017 at 11:15 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> > Ah, nah, scratch that. If any post-index-build pruning had occurred on a
> > page, the evidence would be gone --- the non-matching older tuples would
> > be removed and what remained would look consistent. But it wouldn't
> > match the index. You might be able to find problems if you were willing
> > to do the expensive check on *every* HOT chain, but that seems none too
> > practical.
>
> If the goal is just to detect tuples that aren't indexed and should
> be, what about scanning each index and building a TIDBitmap of all of
> the index entries, and then scanning the heap for non-HOT tuples and
> probing the TIDBitmap for each one? If you find a heap entry for
> which you didn't find an index entry, go and recheck the index and see
> if one got added concurrently. If not, whine.
>
>
This particular case of corruption results in a heap tuple getting indexed
by a wrong key (or to be precise, indexed by its old value). So the only
way to detect the corruption is to look at each index key and check if it
matches with the corresponding heap tuple. We could write some kind of self
join that can use a sequential scan and an index-only scan (David Rowley
had suggested something of that sort internally here), but we can't
guarantee index-only scan on a table which is being concurrently updated.
So not sure even that will catch every possible case.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-02-19 10:32:44 Re: Parallel Index-only scan
Previous Message Robert Haas 2017-02-19 10:21:06 Re: gitlab post-mortem: pg_basebackup waiting for checkpoint