Re: Enabling B-Tree deduplication by default

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Enabling B-Tree deduplication by default
Date: 2020-06-25 23:28:22
Message-ID: CAH2-Wzm1u8HmCamGj2LmtvUudzai5qDJryTotu++JLLD9KVMRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 30, 2020 at 11:40 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I think that I should commit the patch without the GUC tentatively.
> Just have the storage parameter, so that everyone gets the
> optimization without asking for it. We can then review the decision to
> enable deduplication generally after the feature has been in the tree
> for several months.

This is how things work in the committed patch (commit 0d861bbb):
There is a B-Tree storage parameter named deduplicate_items, which is
enabled by default. In general, users will get deduplication unless
they opt out, including in unique indexes (though note that we're more
selective about triggering a deduplication patch in unique indexes
[1]).

> There is no need to make a final decision about whether or not the
> optimization gets enabled before committing the patch.

It's now time to make a final decision on this. Does anyone have any
reason to believe that leaving deduplication enabled by default is the
wrong way to go?

Note that using deduplication isn't strictly better than not using
deduplication for all indexes in all workloads; that's why it's
possible to disable the optimization. This thread has lots of
information about the reasons why enabling deduplication by default
seems appropriate, all of which still apply. Note that there have been
no bug reports involving deduplication since it was committed on
February 26th, with the exception of some minor issues that I reported
and fixed.

The view of the RMT is that the feature should remain enabled by
default (i.e. no changes are required). Of course, I am a member of
the RMT this year, as well as one of the authors of the patch. I am
hardly an impartial voice here. I believe that that did not sway the
decision making process of the RMT in this instance. If there are no
objections in the next week or so, then I'll close out the relevant
open item.

[1] https://www.postgresql.org/docs/devel/btree-implementation.html#BTREE-DEDUPLICATION
-- See "Tip"
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-06-25 23:53:57 Re: Default setting for enable_hashagg_disk
Previous Message Alvaro Herrera 2020-06-25 23:24:17 Re: min_safe_lsn column in pg_replication_slots view