Re: [WIP] Effective storage of duplicates in B-tree index.

From: Thom Brown <thom(at)linux(dot)com>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] Effective storage of duplicates in B-tree index.
Date: 2016-01-28 15:12:36
Message-ID: CAA-aLv68Ptrnk2HpHSg9wYR0c+D2AyTB+XrS671brxqQz3dH3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28 January 2016 at 14:06, Anastasia Lubennikova <
a(dot)lubennikova(at)postgrespro(dot)ru> wrote:

>
> 31.08.2015 10:41, Anastasia Lubennikova:
>
> Hi, hackers!
> I'm going to begin work on effective storage of duplicate keys in B-tree
> index.
> The main idea is to implement posting lists and posting trees for B-tree
> index pages as it's already done for GIN.
>
> In a nutshell, effective storing of duplicates in GIN is organised as
> follows.
> Index stores single index tuple for each unique key. That index tuple
> points to posting list which contains pointers to heap tuples (TIDs). If
> too many rows having the same key, multiple pages are allocated for the
> TIDs and these constitute so called posting tree.
> You can find wonderful detailed descriptions in gin readme
> <https://github.com/postgres/postgres/blob/master/src/backend/access/gin/README>
> and articles <http://www.cybertec.at/gin-just-an-index-type/>.
> It also makes possible to apply compression algorithm to posting list/tree
> and significantly decrease index size. Read more in presentation (part 1)
> <http://www.pgcon.org/2014/schedule/attachments/329_PGCon2014-GIN.pdf>.
>
> Now new B-tree index tuple must be inserted for each table row that we
> index.
> It can possibly cause page split. Because of MVCC even unique index could
> contain duplicates.
> Storing duplicates in posting list/tree helps to avoid superfluous splits.
>
>
> I'd like to share the progress of my work. So here is a WIP patch.
> It provides effective duplicate handling using posting lists the same way
> as GIN does it.
>
> Layout of the tuples on the page is changed in the following way:
> before:
> TID (ip_blkid, ip_posid) + key, TID (ip_blkid, ip_posid) + key, TID
> (ip_blkid, ip_posid) + key
> with patch:
> TID (N item pointers, posting list offset) + key, TID (ip_blkid,
> ip_posid), TID (ip_blkid, ip_posid), TID (ip_blkid, ip_posid)
>
> It seems that backward compatibility works well without any changes. But I
> haven't tested it properly yet.
>
> Here are some test results. They are obtained by test functions
> test_btbuild and test_ginbuild, which you can find in attached sql file.
> i - number of distinct values in the index. So i=1 means that all rows
> have the same key, and i=10000000 means that all keys are different.
> The other columns contain the index size (MB).
>
> i B-tree Old B-tree New GIN
> 1 214,234375 87,7109375 10,2109375
> 10 214,234375 87,7109375 10,71875
> 100 214,234375 87,4375 15,640625
> 1000 214,234375 86,2578125 31,296875
> 10000 214,234375 78,421875 104,3046875
> 100000 214,234375 65,359375 49,078125
> 1000000 214,234375 90,140625 106,8203125
> 10000000 214,234375 214,234375 534,0625
> You can note that the last row contains the same index sizes for B-tree,
> which is quite logical - there is no compression if all the keys are
> distinct.
> Other cases looks really nice to me.
> Next thing to say is that I haven't implemented posting list compression
> yet. So there is still potential to decrease size of compressed btree.
>
> I'm almost sure, there are still some tiny bugs and missed functions, but
> on the whole, the patch is ready for testing.
> I'd like to get a feedback about the patch testing on some real datasets.
> Any bug reports and suggestions are welcome.
>
> Here is a couple of useful queries to inspect the data inside the index
> pages:
> create extension pageinspect;
> select * from bt_metap('idx');
> select bt.* from generate_series(1,1) as n, lateral bt_page_stats('idx',
> n) as bt;
> select n, bt.* from generate_series(1,1) as n, lateral
> bt_page_items('idx', n) as bt;
>
> And at last, the list of items I'm going to complete in the near future:
> 1. Add storage_parameter 'enable_compression' for btree access method
> which specifies whether the index handles duplicates. default is 'off'
> 2. Bring back microvacuum functionality for compressed indexes.
> 3. Improve insertion speed. Insertions became significantly slower with
> compressed btree, which is obviously not what we do want.
> 4. Clean the code and comments, add related documentation.
>

This doesn't apply cleanly against current git head. Have you caught up
past commit 65c5fcd35?

Thom

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2016-01-28 15:17:52 Re: extend pgbench expressions with functions
Previous Message Alvaro Herrera 2016-01-28 15:08:31 Re: [PATCH] better systemd integration