Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date: 2019-09-27 16:43:08
Message-ID: f5069d7e-91e6-635b-5bfe-dce4e18714e2@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

25.09.2019 22:14, Peter Geoghegan wrote:
>
>>> We still haven't added an "off" switch to deduplication, which seems
>>> necessary. I suppose that this should look like GIN's "fastupdate"
>>> storage parameter.
>> Why is it necessary to save this information somewhere but rel->rd_options,
>> while we can easily access this field from _bt_findinsertloc() and
>> _bt_load().
> Maybe, but we also need to access a flag that says it's safe to use
> deduplication. Obviously deduplication is not safe for datatypes like
> numeric and text with a nondeterministic collation. The "is
> deduplication safe for this index?" mechanism will probably work by
> doing several catalog lookups. This doesn't seem like something we
> want to do very often, especially with a buffer lock held -- ideally
> it will be somewhere that's convenient to access.
>
> Do we want to do that separately, and have a storage parameter that
> says "I would like to use deduplication in principle, if it's safe"?
> Or, do we store both pieces of information together, and forbid
> setting the storage parameter to on when it's known to be unsafe for
> the underlying opclasses used by the index? I don't know.
>
> I think that you can start working on this without knowing exactly how
> we'll do those catalog lookups. What you come up with has to work with
> that before the patch can be committed, though.
>
Attached is v19.

* It adds new btree reloption "deduplication".
I decided to refactor the code and move BtreeOptions into a separate
structure,
rather than adding new btree specific value to StdRelOptions.
Now it can be set even for indexes that do not support deduplication.
In that case it will be ignored. Should we add this check to option
validation?

* By default deduplication is on for non-unique indexes and off for
unique ones.

* New function _bt_dedup_is_possible() is intended to be a single place
to perform all the checks. Now it's just a stub to ensure that it works.

Is there a way to extract this from existing opclass information,
or we need to add new opclass field? Have you already started this work?
I recall there was another thread, but didn't manage to find it.

* I also integrated into this version your latest patch that enables
deduplication on unique indexes,
since now it can be easily switched on/off.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
v19-0001-Add-deduplication-to-nbtree.patch text/x-patch 152.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexey Bashtanov 2019-09-27 16:45:23 Re: log bind parameter values on error
Previous Message Amit Khandekar 2019-09-27 16:41:00 Re: Minimal logical decoding on standbys