Re: Enabling B-Tree deduplication by default

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Enabling B-Tree deduplication by default
Date: 2020-01-29 01:36:39
Message-ID: CAH2-WzmByJciGE7ZNvL5=c+bU1y44+Aho_a_YrEcz3bnGeU4qQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 16, 2020 at 12:05 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > It does seem odd to me to treat them differently, but it's possible
> > that this is a reflection of my own lack of understanding. What do
> > other database systems do?
>
> Other database systems treat unique indexes very differently, albeit
> in a way that we're not really in a position to take too much away
> from -- other than the general fact that unique indexes can be thought
> of as very different things.

I should point out here that I've just posted v31 of the patch, which
changes things for unique indexes. Our strategy during deduplication
is now the same for unique indexes, since the original,
super-incremental approach doesn't seem to make sense anymore. Further
optimization work in the patch eliminated problems that made this
approach seem like it might be worthwhile.

Note, however, that v31 changes nothing about how we think about
deduplication in unique indexes in general, nor how it is presented to
users. There is still special criteria around how deduplication is
*triggered* in unique indexes. We continue to trigger a deduplication
pass based on seeing a duplicate within _bt_check_unique() +
_bt_findinsertloc() -- otherwise we never attempt deduplication in a
unique index (same as before). Plus the GUC still doesn't affect
unique indexes, unique index deduplication still isn't really
documented in the user docs (it just gets a passing mention in B-Tree
internals section), etc. This seems like the right way to go, since
deduplication in unique indexes can only make sense on leaf pages
where most or all new items are duplicates of existing items, a
situation that is already easy to detect.

It wouldn't be that bad if we always attempted deduplication in a
unique index, but it's easy to only do it when we're pretty confident
we'll get a benefit -- why not save a few cycles?

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2020-01-29 01:49:32 Re: Is custom MemoryContext prohibited?
Previous Message Peter Geoghegan 2020-01-29 01:29:05 Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.