Re: Non-deterministic IndexTuple toast compression from index_form_tuple() + amcheck false positives

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Non-deterministic IndexTuple toast compression from index_form_tuple() + amcheck false positives
Date: 2019-02-02 02:27:51
Message-ID: CAH2-WznJZXUb_4ZN+e_W7U3rCHUne0TCzX4hK0dY6+VoD6onMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 23, 2019 at 10:59 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > The fix here must be to normalize index tuples that are compressed
> > within amcheck, both during initial fingerprinting, and during
> > subsequent probes of the Bloom filter in bt_tuple_present_callback().
>
> I happened to talk to Andres about this in person yesterday. He
> thought that there was reason to be concerned about the need for
> logical normalization beyond TOAST issues. Expression indexes were a
> particular concern, because they could in principle have a change in
> the on-disk representation without a change of logical values -- false
> positives could result. He suggested that the long term solution was
> to bring hash operator class hash functions into Bloom filter hashing,
> at least where available.

I think that the best way forward is to normalize to compensate for
inconsistent input datum TOAST state, and leave it at that. ISTM that
logical normalization beyond that (based on hashing, or anything else)
creates more problems than it solves. I am concerned about cases like
INCLUDE indexes (which may have datums that lack even a B-Tree
opclass), and about the logical-though-semantically-relevant facets of
some datatypes such as numeric's display scale. If I can get an
example from Andres of a case where further logical normalization is
necessary to avoid false positives with expression indexes, that may
change things. (BTW, I implemented another amcheck enhancement that
searches indexes from the root to find matches -- the code is a
trivial addition to the new patch series I'm working on, and seems
like a better way to do enhanced logical normalization if that proves
to be truly necessary.)

Attached draft patch fixes the bug by doing fairly simple
normalization. I think that TOAST compression of datums in indexes is
fairly rare in practice, so I'm not very worried about the fact that
this won't perform as well as it could with indexes that have a lot of
compressed datums. I think that the interface I've added might need to
be expanded for other things in the future (e.g., to make amcheck work
with nbtree-native duplicate compression), and not worrying about the
performance too much helps with that goal.

I'll pick this up next week, and likely commit a fix by Wednesday or
Thursday if there are no objections. I'm not sure if the test case is
worth including.

--
Peter Geoghegan

Attachment Content-Type Size
0001-Avoid-amcheck-TOAST-compression-inconsistencies.patch application/octet-stream 13.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-02-02 02:31:51 Re: [Patch] Log10 and hyperbolic functions for SQL:2016 compliance
Previous Message Amit Kapila 2019-02-02 02:00:18 Re: WIP: Avoid creation of the free space map for small tables