Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Antonin Houska <ah(at)cybertec(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence
Date: 2020-01-02 17:11:12
Message-ID: CAH2-Wz=3pmbGrtBf_oFc0wS=aDxdsTCink7yKbRSw0gu92P5Vw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 2, 2020 at 6:42 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Dec 30, 2019 at 6:58 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > I propose that we adopt the following definition: For an operator
> > class to be safe, its equality operator has to always agree with
> > datum_image_eq() (i.e. two datums must be bitwise equal after
> > detoasting).
>
> I suggested using datumIsEqual() as the canonical definition. (I
> wonder why datum_image_eq() does not reuse that function?)

The difference between datum_image_eq() and datumIsEqual() is that
only the former will consider two datums equal when they happen to
have different TOAST input states -- we need that here. datumIsEqual()
avoids doing this because sometimes it needs to work for callers
operating within an aborted transaction. datum_image_eq() was
originally used for the "*=, *<>, *<, *<=, *>, and *>=" rowtype B-Tree
operator class needed by REFRESH MATERIALIZED VIEW CONCURRENTLY.
(Actually, that's not quite true, since datum_image_eq() is a spin-off
of the rowtype code that was added much more recently to fix a bug in
foreign keys.)

The B-Tree code and amcheck need to be tolerant of inconsistent TOAST
input states. This isn't particularly likely to happen, but it would
be hard to revoke the general assumption that that's okay now. Also,
it's not that hard to deal with it directly. For example, we're not
reliant on equal index tuples all being the same size in the
deduplication patch.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message rmrodriguez 2020-01-02 17:18:54 Re: avoid some calls to memset with array initializer
Previous Message Merlin Moncure 2020-01-02 16:59:35 Re: Greatest Common Divisor