Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date: 2019-11-13 19:51:18
Message-ID: CAH2-Wz=zJa7Biitwz4K=C7UmaYdLWXarQ4uTwwAKSatEnd9TGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 13, 2019 at 11:33 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Nov 12, 2019 at 6:22 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > * Disabled deduplication in system catalog indexes by deeming it
> > generally unsafe.
>
> I (continue to) think that deduplication is a terrible name, because
> you're not getting rid of the duplicates. You are using a compressed
> representation of the duplicates.

"Deduplication" never means that you get rid of duplicates. According
to Wikipedia's deduplication article: "Whereas compression algorithms
identify redundant data inside individual files and encodes this
redundant data more efficiently, the intent of deduplication is to
inspect large volumes of data and identify large sections – such as
entire files or large sections of files – that are identical, and
replace them with a shared copy".

This seemed like it fit what this patch does. We're concerned with a
specific, simple kind of redundancy. Also:

* From the user's point of view, we're merging together what they'd
call duplicates. They don't really think of the heap TID as part of
the key.

* The term "compression" suggests a decompression penalty when
reading, which is not the case here.

* The term "compression" confuses the feature added by the patch with
TOAST compression. Now we may have two very different varieties of
compression in the same index.

Can you suggest an alternative?

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-11-13 19:52:48 Re: [PATCH] gcc warning 'expression which evaluates to zero treated as a null pointer'
Previous Message Andrew Dunstan 2019-11-13 19:48:01 Re: ssl passphrase callback