Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Юрий Соколов <funny(dot)falcon(at)gmail(dot)com>
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date: 2020-02-19 19:16:12
Message-ID: CAH2-Wzk2p_9i7i-s22zVGdJyExaneNCcrwH4BtEDZqr73D6v_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 19, 2020 at 8:14 AM Anastasia Lubennikova
<a(dot)lubennikova(at)postgrespro(dot)ru> wrote:
> Thank you for this work. I've looked through the patches and they seem
> to be ready for commit.
> I haven't yet read recent documentation and readme changes, so maybe
> I'll send some more feedback tomorrow.

Great.

> Is there any specific reason, why we need separate btnameequalimage,
> bpchar_equalimage and bttextequalimage functions?
> As far as I see, they have the same implementation.

Not really. This approach allows us to reverse the decision to enable
deduplication in a point release, which is theoretically useful. OTOH,
if that's so important, why not have many more support function 4
implementations (one per opclass)?

I suspect that we would just disable deduplication in a hard-coded
fashion if we needed to disable it due to some issue that transpired.
For example, we could do this by modifying _bt_allequalimage() itself.

> I would simply move it to debug level for all cases. Since from user's
> perspective it doesn't differ that much from the case where
> deduplication is applicable in general, but not very efficient due to
> data distribution.

I was more concerned about cases where the user would really like to
use deduplication, but wants to make sure that it gets used. And
doesn't want to install pageinspect to find out.

> I also noticed that this is not consistent with ALTER index. For
> example, alter index idx_n set (deduplicate_items =true); won't show any
> message about deduplication.

But that's a change in the user's preference. Not a change in whether
or not it's safe in principle.

> In my opinion, this message is too specific for default behavior. It
> exposes internal details without explanation and may look to user like
> something went wrong.

You're probably right about that. I just wish that there was some way
of showing the same information that was discoverable, and didn't
require the use of pageinspect. If I make it a DEBUG1 message, then it
cannot really be documented.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-02-19 19:16:36 Re: Memory-Bounded Hash Aggregation
Previous Message David Steele 2020-02-19 18:42:53 Re: [Patch] Make pg_checksums skip foreign tablespace directories