Re: Enabling B-Tree deduplication by default

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Enabling B-Tree deduplication by default
Date: 2020-01-29 14:56:39
Message-ID: CA+TgmoagOYu3361LnRn7_nLswOqpQNUa3cY-GpxxWSKho4j=8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 16, 2020 at 3:05 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> The main reason that I am confident about unique indexes is that we
> only do a deduplication pass in a unique index when we observe that
> the incoming tuple (the one that might end up splitting the page) is a
> duplicate of some existing tuple. Checking that much is virtually
> free, since we already have the information close at hand today (we
> cache the _bt_check_unique() binary search bounds for reuse within
> _bt_findinsertloc() today). This seems to be an excellent heuristic,
> since we really only want to target unique index leaf pages where all
> or almost all insertions must be duplicates caused by non-HOT updates
> -- this category includes all the pgbench indexes, and includes all of
> the unique indexes in TPC-C. Whereas with non-unique indexes, we
> aren't specifically targeting version churn (though it will help with
> that too).

This (and the rest of the explanation) don't really address my
concern. I understand that deduplicating in lieu of splitting a page
in a unique index is highly likely to be a win. What I don't
understand is why it shouldn't just be a win, period. Not splitting a
page seems like it has a big upside regardless of whether the index is
unique -- and in fact, the upside could be a lot bigger for a
non-unique index. If the coarse-grained LP_DEAD thing is the problem,
then I can grasp that issue, but you don't seem very worried about
that.

Generally, I think it's a bad idea to give the user an "emergency off
switch" and then sometimes ignore it. If the feature seems to be
generally beneficial, but you're worried that there might be
regressions in obscure cases, then turn it on by default, and give the
user the ability to forcibly turn it off. But don't give the the
opportunity to forcibly turn it off sometimes. Nobody's going to run
around setting a reloption just for fun -- they're going to do it
because they hit a problem.

I guess I'm also saying here that a reloption seems like a much better
idea than a GUC. I don't see much reason to believe that a system-wide
setting will be useful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-01-29 14:59:10 Re: closesocket behavior in different platforms
Previous Message Robert Haas 2020-01-29 14:47:52 Re: [Proposal] Global temporary tables