Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Pawel Kudzia <kudzia(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
Date: 2021-10-14 01:26:20
Message-ID: CAH2-Wzm8gQYhkFPMYgoQcfA=vz7xjd85ZSfXdcHqmx9wFddjSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sun, Jul 25, 2021 at 12:08 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> I'll work on a patch to add more sanity checks to the GIN code when it
> traverses the tree, to catch the case that it accidentally steps on a
> wrong kind of a page (I'm pretty busy next week, so might not get to
> that until the week after though). I don't think that will help here,
> but who knows, and at least we can rule out some kinds of bugs.

I just noticed that among call sites to ginInsertCleanup() located in
ginvacuum.c, only one caller specifies fill_fsm=true: The call made
when an autoanalyze (i.e. av worker that just does an ANALYZE) is run.
In other words, a call through the special analyze_only path [1].
Note, in particular, that a plain VACUUM, or a plain VACUUM ANALYZE
will always specify fill_fsm=false, regardless of any of the details.
This seems totally contradictory to me.

(***Thinks some more***)

Actually, that's not quite right -- it's not contradictory once you
realize that fill_fsm=true is an extra special path, originally
designed just for the analyze_only path. That's easy to see once you
look at commit dc943ad952 from 2015. Its specific goal was to allow
this special analyze_only path to recycle pages. We need fill_fsm=true
because the analyze_only path won't scan the whole index again at the
end of ginvacuumcleanup(). (Actually, why even scan it in the similar
cleanup-only VACUUM case? Ugh, that's not worth getting into now.)

In summary: Although I always suspected the fill_fsm=true ShiftList()
path here, it's even easier to suspect it now. Because I see now that
it hardly ever gets used inside autovacuum worker processes, though of
course it is possible. It's probably a matter of luck as to whether
you hit the analyze_only + fill_fsm=true ShiftList() path. That might
well have contributed to our difficulty in recreating the bugs here.

This is such a mess. I'm not sure that this insight will be news to
you, Heikki. Perhaps I'm just venting about the confused state of the
code here, following a recent unpleasant reminder (again, see [1]).

[1] https://postgr.es/m/CAH2-WzkjrK556enVtFLmyXEdw91xGuwiyZVep2kp5yQT_-3JDg@mail.gmail.com

--
Peter Geoghegan

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2021-10-14 02:07:21 Re: BUG #17220: ALTER INDEX ALTER COLUMN SET (..) with an optionless opclass makes index and table unusable
Previous Message Masahiko Sawada 2021-10-14 00:59:23 Re: Inconsistent behavior of pg_dump/pg_restore on DEFAULT PRIVILEGES