|From:||Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>|
|Subject:||Re: [WIP] Effective storage of duplicates in B-tree index.|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
28.01.2016 18:12, Thom Brown:
> On 28 January 2016 at 14:06, Anastasia Lubennikova
> <a(dot)lubennikova(at)postgrespro(dot)ru <mailto:a(dot)lubennikova(at)postgrespro(dot)ru>>
> 31.08.2015 10:41, Anastasia Lubennikova:
>> Hi, hackers!
>> I'm going to begin work on effective storage of duplicate keys in
>> B-tree index.
>> The main idea is to implement posting lists and posting trees for
>> B-tree index pages as it's already done for GIN.
>> In a nutshell, effective storing of duplicates in GIN is
>> organised as follows.
>> Index stores single index tuple for each unique key. That index
>> tuple points to posting list which contains pointers to heap
>> tuples (TIDs). If too many rows having the same key, multiple
>> pages are allocated for the TIDs and these constitute so called
>> posting tree.
>> You can find wonderful detailed descriptions in gin readme
>> and articles <http://www.cybertec.at/gin-just-an-index-type/>.
>> It also makes possible to apply compression algorithm to posting
>> list/tree and significantly decrease index size. Read more in
>> presentation (part 1)
>> Now new B-tree index tuple must be inserted for each table row
>> that we index.
>> It can possibly cause page split. Because of MVCC even unique
>> index could contain duplicates.
>> Storing duplicates in posting list/tree helps to avoid
>> superfluous splits.
> I'd like to share the progress of my work. So here is a WIP patch.
> It provides effective duplicate handling using posting lists the
> same way as GIN does it.
> Layout of the tuples on the page is changed in the following way:
> TID (ip_blkid, ip_posid) + key, TID (ip_blkid, ip_posid) + key,
> TID (ip_blkid, ip_posid) + key
> with patch:
> TID (N item pointers, posting list offset) + key, TID (ip_blkid,
> ip_posid), TID (ip_blkid, ip_posid), TID (ip_blkid, ip_posid)
> It seems that backward compatibility works well without any
> changes. But I haven't tested it properly yet.
> Here are some test results. They are obtained by test functions
> test_btbuild and test_ginbuild, which you can find in attached sql
> i - number of distinct values in the index. So i=1 means that all
> rows have the same key, and i=10000000 means that all keys are
> The other columns contain the index size (MB).
> i B-tree Old B-tree New GIN
> 1 214,234375 87,7109375 10,2109375
> 10 214,234375 87,7109375 10,71875
> 100 214,234375 87,4375 15,640625
> 1000 214,234375 86,2578125 31,296875
> 10000 214,234375 78,421875 104,3046875
> 100000 214,234375 65,359375 49,078125
> 1000000 214,234375 90,140625 106,8203125
> 10000000 214,234375 214,234375 534,0625
> You can note that the last row contains the same index sizes for
> B-tree, which is quite logical - there is no compression if all
> the keys are distinct.
> Other cases looks really nice to me.
> Next thing to say is that I haven't implemented posting list
> compression yet. So there is still potential to decrease size of
> compressed btree.
> I'm almost sure, there are still some tiny bugs and missed
> functions, but on the whole, the patch is ready for testing.
> I'd like to get a feedback about the patch testing on some real
> datasets. Any bug reports and suggestions are welcome.
> Here is a couple of useful queries to inspect the data inside the
> index pages:
> create extension pageinspect;
> select * from bt_metap('idx');
> select bt.* from generate_series(1,1) as n, lateral
> bt_page_stats('idx', n) as bt;
> select n, bt.* from generate_series(1,1) as n, lateral
> bt_page_items('idx', n) as bt;
> And at last, the list of items I'm going to complete in the near
> 1. Add storage_parameter 'enable_compression' for btree access
> method which specifies whether the index handles duplicates.
> default is 'off'
> 2. Bring back microvacuum functionality for compressed indexes.
> 3. Improve insertion speed. Insertions became significantly slower
> with compressed btree, which is obviously not what we do want.
> 4. Clean the code and comments, add related documentation.
> This doesn't apply cleanly against current git head. Have you caught
> up past commit 65c5fcd35?
Thank you for the notice. New patch is attached.
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
|Next Message||Peter Eisentraut||2016-01-28 16:17:27||insufficient qualification of some objects in dump files|
|Previous Message||Robert Haas||2016-01-28 16:12:15||Re: Fwd: Core dump with nested CREATE TEMP TABLE|