Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date: 2019-09-25 15:05:03
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

24.09.2019 3:13, Peter Geoghegan wrote:
> On Wed, Sep 18, 2019 at 7:25 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>> I attach version 16. This revision merges your recent work on WAL
>> logging with my recent work on simplifying _bt_dedup_one_page(). See
>> my e-mail from earlier today for details.
> I attach version 17. This version has changes that are focussed on
> further polishing certain things, including fixing some minor bugs. It
> seemed worth creating a new version for that. (I didn't get very far
> with the space utilization stuff I talked about, so no changes there.)
Attached is v18. In this version bt_dedup_one_page() is refactored so that:
- no temp page is used, all updates are applied to the original page.
- each posting tuple wal logged separately.
This also allowed to simplify btree_xlog_dedup significantly.

> Another infrastructure thing that the patch needs to handle to be committable:
> We still haven't added an "off" switch to deduplication, which seems
> necessary. I suppose that this should look like GIN's "fastupdate"
> storage parameter. It's not obvious how to do this in a way that's
> easy to work with, though. Maybe we could do something like copy GIN's
> GinGetUseFastUpdate() macro, but the situation with nbtree is actually
> quite different. There are two questions for nbtree when it comes to
> deduplication within an inde: 1) Does the user want to use
> deduplication, because that will help performance?, and 2) Is it
> safe/possible to use deduplication at all?
I'll send another version with dedup option soon.

> I think that we should probably stash this information (deduplication
> is both possible and safe) in the metapage. Maybe we can copy it over
> to our insertion scankey, just like the "heapkeyspace" field -- that
> information also comes from the metapage (it's based on the nbtree
> version). The "heapkeyspace" field is a bit ugly, so maybe we
> shouldn't go further by adding something similar, but I don't see any
> great alternative right now.
Why is it necessary to save this information somewhere but rel->rd_options,
while we can easily access this field from _bt_findinsertloc() and

Anastasia Lubennikova
Postgres Professional:
The Russian Postgres Company

Attachment Content-Type Size
v18-0001-Add-deduplication-to-nbtree.patch text/x-patch 137.5 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-09-25 15:22:19 Re: Proposal for syntax to support creation of partition tables when creating parent table
Previous Message Liudmila Mantrova 2019-09-25 14:46:08 Re: JSONPATH documentation