Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Subject: Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence
Date: 2019-08-25 23:19:10
Message-ID: CAH2-WznXowi-RTs86WPGxgF+K3CCa5_Ab_LB7wSKk_sHTuxO5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Aug 25, 2019 at 2:55 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I suppose that we'd add something new to CREATE OPERATOR CLASS to make
> this work? My instinct is to avoid adding things that are only
> meaningful for a single AM to interfaces like CREATE OPERATOR CLASS,
> but the system already has numerous dependencies on B-Tree opclasses
> that seem comparable to me.

Another question is whether or not it would be okay to define
"equality is precise"-ness to be "the system's generic equality
function works perfectly as a drop-in replacement for my own equality
operator's function". The system's generic equality function could be
the recently added datum_image_eq() function -- that looks like it
will do exactly what I have in mind. This would be a new way of using
datum_image_eq(), I think, since it wouldn't be okay for it to give an
answer that differed from the equality operator's function. It looks
like existing datum_image_eq() callers can deal with false negatives
(but not false positives, which are impossible).

This exceeds what is strictly necessary for the deduplication patch,
but it seems like the patch should make comparisons as fast as
possible in the context of deduplicating items (it would be nice if it
could just use datum_image_eq instead of an insertion scankey when
doing many comparisons to deduplicate items). We can imagine a
datatype with undefined garbage bytes that affect the answer that
datum_image_eq() gives, but could be safe targets for deduplication,
so it's not clear if being this aggressive will work. But maybe that
isn't actually possible among types that aren't inherently unsafe for
deduplication. And maybe we could be more aggressive with
optimizations in numerous other contexts by defining "equality is
precise"-ness as strict binary equality after accounting for TOAST
compression.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-08-26 01:09:19 Re: pg11.5: ExecHashJoinNewBatch: glibc detected...double free or corruption (!prev)
Previous Message Peter Geoghegan 2019-08-25 21:55:08 Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence