Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: Antonin Houska <ah(at)cybertec(dot)at>, Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence
Date: 2019-09-30 17:03:54
Message-ID: daca43e3-3857-b933-4194-64d4c8ff261f@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

26.08.2019 14:15, Antonin Houska wrote:
> Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
>> Consumers of this new infrastructure probably won't be limited to the
>> deduplication feature;
> It'd also solve an open problem of the aggregate push-down patch [1], in
> particular see the mention of pg_opclass in [2]: the partial aggregate
> node below the final join must not put multiple opclass-equal values of
> which are not byte-wise equal into the same group because some
> information needed by WHERE or JOIN/ON condition may be lost this
> way. The scale of the numeric type is the most obvious example.
>
>> I would like to:
>>
>> * Get some buy-in on whether or not the precise distinctions I would
>> like to make are correct for deduplication in particular, and as
>> useful as possible for other cases that we may need to add later on.
>>
>> * Figure out the exact interface through which opclass/opfamily
>> authors can represent that their notion of equality is compatible with
>> deduplication/compression.
> It's not entirely clear to me whether opclass or opfamily should carry
> this information. opclass probably makes more sense for index related
> problems and the aggregate push-down patch can live with that. I don't
> see particular reason to add any flag to opfamily. (Planner uses uses
> both pg_opclass and pg_opfamily catalogs.)
>
> I think the fact that the aggregate push-down would benefit from this
> enhancement should affect choice of the new catalog attribute name,
> i.e. it should be not mention words as concrete as "deduplication" or
> "compression".

The patch implementing new opclass option is attached.

It adds new attribute pg_opclass.opcisbitwise, which is set to true if
opclass equality is the same as binary equality.
By default it is true. It is set to false for numeric and float4, float8.

Does anyarray opclasses need special treatment?

New syntax for create opclass is  "CREATE OPERATOR CLASS NOT BITWISE ..."

Any ideas on better names?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
v1-Opclass-bitwise-equality.patch text/x-patch 9.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-09-30 17:13:49 Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)
Previous Message Ryan Lambert 2019-09-30 16:47:04 Re: FETCH FIRST clause PERCENT option