Quick Links

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Subject:	Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence
Date:	2019-08-26 11:15:40
Message-ID:	3890.1566818140@antos
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> Consumers of this new infrastructure probably won't be limited to the
> deduplication feature;

It'd also solve an open problem of the aggregate push-down patch [1], in
particular see the mention of pg_opclass in [2]: the partial aggregate
node below the final join must not put multiple opclass-equal values of
which are not byte-wise equal into the same group because some
information needed by WHERE or JOIN/ON condition may be lost this
way. The scale of the numeric type is the most obvious example.

> I would like to:
>
> * Get some buy-in on whether or not the precise distinctions I would
> like to make are correct for deduplication in particular, and as
> useful as possible for other cases that we may need to add later on.
>
> * Figure out the exact interface through which opclass/opfamily
> authors can represent that their notion of equality is compatible with
> deduplication/compression.

It's not entirely clear to me whether opclass or opfamily should carry
this information. opclass probably makes more sense for index related
problems and the aggregate push-down patch can live with that. I don't
see particular reason to add any flag to opfamily. (Planner uses uses
both pg_opclass and pg_opfamily catalogs.)

I think the fact that the aggregate push-down would benefit from this
enhancement should affect choice of the new catalog attribute name,
i.e. it should be not mention words as concrete as "deduplication" or
"compression".

> (I think that the use of nondeterministic collations should disable
> deduplication without explicit action from the operator class -- that
> should just be baked in.)

(I think the aggregate push-down needs to consider the nondeterministic
collations too, I missed that so far.)

[1] https://commitfest.postgresql.org/24/1247/

[2] https://www.postgresql.org/message-id/10529.1547561178%40localhost

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

In response to

Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence at 2019-08-25 20:29:09 from Peter Geoghegan

Responses

Re: Building infrastructure for B-Tree deduplication that recognizes when opclass equality is also equivalence at 2019-09-30 17:03:54 from Anastasia Lubennikova

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Daniel Migowski	2019-08-26 11:28:47	Proposal: Better generation of values in GENERATED columns.
Previous Message	Masahiko Sawada	2019-08-26 11:14:23	Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)