Re: [WIP] Effective storage of duplicates in B-tree index.

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [WIP] Effective storage of duplicates in B-tree index.
Date: 2016-01-28 16:12:54
Message-ID: 56AA3E06.8040006@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

28.01.2016 18:12, Thom Brown:
> On 28 January 2016 at 14:06, Anastasia Lubennikova
> <a(dot)lubennikova(at)postgrespro(dot)ru <mailto:a(dot)lubennikova(at)postgrespro(dot)ru>>
> wrote:
>
>
> 31.08.2015 10:41, Anastasia Lubennikova:
>> Hi, hackers!
>> I'm going to begin work on effective storage of duplicate keys in
>> B-tree index.
>> The main idea is to implement posting lists and posting trees for
>> B-tree index pages as it's already done for GIN.
>>
>> In a nutshell, effective storing of duplicates in GIN is
>> organised as follows.
>> Index stores single index tuple for each unique key. That index
>> tuple points to posting list which contains pointers to heap
>> tuples (TIDs). If too many rows having the same key, multiple
>> pages are allocated for the TIDs and these constitute so called
>> posting tree.
>> You can find wonderful detailed descriptions in gin readme
>> <https://github.com/postgres/postgres/blob/master/src/backend/access/gin/README>
>> and articles <http://www.cybertec.at/gin-just-an-index-type/>.
>> It also makes possible to apply compression algorithm to posting
>> list/tree and significantly decrease index size. Read more in
>> presentation (part 1)
>> <http://www.pgcon.org/2014/schedule/attachments/329_PGCon2014-GIN.pdf>.
>>
>> Now new B-tree index tuple must be inserted for each table row
>> that we index.
>> It can possibly cause page split. Because of MVCC even unique
>> index could contain duplicates.
>> Storing duplicates in posting list/tree helps to avoid
>> superfluous splits.
>
> I'd like to share the progress of my work. So here is a WIP patch.
> It provides effective duplicate handling using posting lists the
> same way as GIN does it.
>
> Layout of the tuples on the page is changed in the following way:
> before:
> TID (ip_blkid, ip_posid) + key, TID (ip_blkid, ip_posid) + key,
> TID (ip_blkid, ip_posid) + key
> with patch:
> TID (N item pointers, posting list offset) + key, TID (ip_blkid,
> ip_posid), TID (ip_blkid, ip_posid), TID (ip_blkid, ip_posid)
>
> It seems that backward compatibility works well without any
> changes. But I haven't tested it properly yet.
>
> Here are some test results. They are obtained by test functions
> test_btbuild and test_ginbuild, which you can find in attached sql
> file.
> i - number of distinct values in the index. So i=1 means that all
> rows have the same key, and i=10000000 means that all keys are
> different.
> The other columns contain the index size (MB).
>
> i B-tree Old B-tree New GIN
> 1 214,234375 87,7109375 10,2109375
> 10 214,234375 87,7109375 10,71875
> 100 214,234375 87,4375 15,640625
> 1000 214,234375 86,2578125 31,296875
> 10000 214,234375 78,421875 104,3046875
> 100000 214,234375 65,359375 49,078125
> 1000000 214,234375 90,140625 106,8203125
> 10000000 214,234375 214,234375 534,0625
>
>
> You can note that the last row contains the same index sizes for
> B-tree, which is quite logical - there is no compression if all
> the keys are distinct.
> Other cases looks really nice to me.
> Next thing to say is that I haven't implemented posting list
> compression yet. So there is still potential to decrease size of
> compressed btree.
>
> I'm almost sure, there are still some tiny bugs and missed
> functions, but on the whole, the patch is ready for testing.
> I'd like to get a feedback about the patch testing on some real
> datasets. Any bug reports and suggestions are welcome.
>
> Here is a couple of useful queries to inspect the data inside the
> index pages:
> create extension pageinspect;
> select * from bt_metap('idx');
> select bt.* from generate_series(1,1) as n, lateral
> bt_page_stats('idx', n) as bt;
> select n, bt.* from generate_series(1,1) as n, lateral
> bt_page_items('idx', n) as bt;
>
> And at last, the list of items I'm going to complete in the near
> future:
> 1. Add storage_parameter 'enable_compression' for btree access
> method which specifies whether the index handles duplicates.
> default is 'off'
> 2. Bring back microvacuum functionality for compressed indexes.
> 3. Improve insertion speed. Insertions became significantly slower
> with compressed btree, which is obviously not what we do want.
> 4. Clean the code and comments, add related documentation.
>
>
> This doesn't apply cleanly against current git head. Have you caught
> up past commit 65c5fcd35?

Thank you for the notice. New patch is attached.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
btree_compression_1.0(rebased).patch text/x-patch 57.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2016-01-28 16:17:27 insufficient qualification of some objects in dump files
Previous Message Robert Haas 2016-01-28 16:12:15 Re: Fwd: Core dump with nested CREATE TEMP TABLE