Re: [WIP] Effective storage of duplicates in B-tree index.

From: Alexandr Popov <a(dot)popov(at)postgrespro(dot)ru>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [WIP] Effective storage of duplicates in B-tree index.
Date: 2016-03-23 15:30:14
Message-ID: 56F2B686.9070602@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18.03.2016 20:19, Anastasia Lubennikova wrote:
> Please, find the new version of the patch attached. Now it has WAL
> functionality.
>
> Detailed description of the feature you can find in README draft
> https://goo.gl/50O8Q0
>
> This patch is pretty complicated, so I ask everyone, who interested in
> this feature,
> to help with reviewing and testing it. I will be grateful for any
> feedback.
> But please, don't complain about code style, it is still work in
> progress.
>
> Next things I'm going to do:
> 1. More debugging and testing. I'm going to attach in next message
> couple of sql scripts for testing.
> 2. Fix NULLs processing
> 3. Add a flag into pg_index, that allows to enable/disable compression
> for each particular index.
> 4. Recheck locking considerations. I tried to write code as less
> invasive as possible, but we need to make sure that algorithm is still
> correct.
> 5. Change BTMaxItemSize
> 6. Bring back microvacuum functionality.
>

Hi, hackers.

It's my first review, so do not be strict to me.

I have tested this patch on the next table:
create table message
(
id serial,
usr_id integer,
text text
);
CREATE INDEX message_usr_id ON message (usr_id);
The table has 10000000 records.

I found the following:
The less unique keys the less size of the table.

Next 2 tablas demonstrates it.
New B-tree
Count of unique keys (usr_id), index“s size , time of creation
10000000 ;"214 MB" ;"00:00:34.193441"
3333333 ;"214 MB" ;"00:00:45.731173"
2000000 ;"129 MB" ;"00:00:41.445876"
1000000 ;"129 MB" ;"00:00:38.455616"
100000 ;"86 MB" ;"00:00:40.887626"
10000 ;"79 MB" ;"00:00:47.199774"

Old B-tree
Count of unique keys (usr_id), index“s size , time of creation
10000000 ;"214 MB" ;"00:00:35.043677"
3333333 ;"286 MB" ;"00:00:40.922845"
2000000 ;"300 MB" ;"00:00:46.454846"
1000000 ;"278 MB" ;"00:00:42.323525"
100000 ;"287 MB" ;"00:00:47.438132"
10000 ;"280 MB" ;"00:01:00.307873"

I inserted data randomly and sequentially, it did not influence the
index's size.
Time of select, insert and update random rows is not changed. It is
great, but certainly it needs some more detailed study.

Alexander Popov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Yury Zhuravlev 2016-03-23 15:34:39 Re: NOT EXIST for PREPARE
Previous Message Robert Haas 2016-03-23 15:02:08 Re: Patch: fix lock contention for HASHHDR.mutex