|From:||Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>|
|Subject:||Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
24.09.2019 3:13, Peter Geoghegan wrote:
> On Wed, Sep 18, 2019 at 7:25 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>> I attach version 16. This revision merges your recent work on WAL
>> logging with my recent work on simplifying _bt_dedup_one_page(). See
>> my e-mail from earlier today for details.
> I attach version 17. This version has changes that are focussed on
> further polishing certain things, including fixing some minor bugs. It
> seemed worth creating a new version for that. (I didn't get very far
> with the space utilization stuff I talked about, so no changes there.)
Attached is v18. In this version bt_dedup_one_page() is refactored so that:
- no temp page is used, all updates are applied to the original page.
- each posting tuple wal logged separately.
This also allowed to simplify btree_xlog_dedup significantly.
> Another infrastructure thing that the patch needs to handle to be committable:
> We still haven't added an "off" switch to deduplication, which seems
> necessary. I suppose that this should look like GIN's "fastupdate"
> storage parameter. It's not obvious how to do this in a way that's
> easy to work with, though. Maybe we could do something like copy GIN's
> GinGetUseFastUpdate() macro, but the situation with nbtree is actually
> quite different. There are two questions for nbtree when it comes to
> deduplication within an inde: 1) Does the user want to use
> deduplication, because that will help performance?, and 2) Is it
> safe/possible to use deduplication at all?
I'll send another version with dedup option soon.
> I think that we should probably stash this information (deduplication
> is both possible and safe) in the metapage. Maybe we can copy it over
> to our insertion scankey, just like the "heapkeyspace" field -- that
> information also comes from the metapage (it's based on the nbtree
> version). The "heapkeyspace" field is a bit ugly, so maybe we
> shouldn't go further by adding something similar, but I don't see any
> great alternative right now.
Why is it necessary to save this information somewhere but rel->rd_options,
while we can easily access this field from _bt_findinsertloc() and
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
|Next Message||Fabien COELHO||2019-09-25 15:22:19||Re: Proposal for syntax to support creation of partition tables when creating parent table|
|Previous Message||Liudmila Mantrova||2019-09-25 14:46:08||Re: JSONPATH documentation|