Re: [WIP] Effective storage of duplicates in B-tree index.

From: Thom Brown <thom(at)linux(dot)com>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] Effective storage of duplicates in B-tree index.
Date: 2016-02-02 11:59:21
Message-ID: CAA-aLv6CD8VZYo40unBewD+8XfQ-9jT0KcPYjQObBiaUsKicgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2 February 2016 at 11:47, Anastasia Lubennikova
<a(dot)lubennikova(at)postgrespro(dot)ru> wrote:
>
>
> 29.01.2016 20:43, Thom Brown:
>
>> On 29 January 2016 at 16:50, Anastasia Lubennikova
>> <a(dot)lubennikova(at)postgrespro(dot)ru> wrote:
>>>
>>> 29.01.2016 19:01, Thom Brown:
>>>>
>>>> On 29 January 2016 at 15:47, Aleksander Alekseev
>>>> <a(dot)alekseev(at)postgrespro(dot)ru> wrote:
>>>>>
>>>>> I tested this patch on x64 and ARM servers for a few hours today. The
>>>>> only problem I could find is that INSERT works considerably slower
>>>>> after
>>>>> applying a patch. Beside that everything looks fine - no crashes, tests
>>>>> pass, memory doesn't seem to leak, etc.
>>>
>>> Thank you for testing. I rechecked that, and insertions are really very
>>> very
>>> very slow. It seems like a bug.
>>>
>>>>>> Okay, now for some badness. I've restored a database containing 2
>>>>>> tables, one 318MB, another 24kB. The 318MB table contains 5 million
>>>>>> rows with a sequential id column. I get a problem if I try to delete
>>>>>> many rows from it:
>>>>>> # delete from contacts where id % 3 != 0 ;
>>>>>> WARNING: out of shared memory
>>>>>> WARNING: out of shared memory
>>>>>> WARNING: out of shared memory
>>>>>
>>>>> I didn't manage to reproduce this. Thom, could you describe exact steps
>>>>> to reproduce this issue please?
>>>>
>>>> Sure, I used my pg_rep_test tool to create a primary (pg_rep_test
>>>> -r0), which creates an instance with a custom config, which is as
>>>> follows:
>>>>
>>>> shared_buffers = 8MB
>>>> max_connections = 7
>>>> wal_level = 'hot_standby'
>>>> cluster_name = 'primary'
>>>> max_wal_senders = 3
>>>> wal_keep_segments = 6
>>>>
>>>> Then create a pgbench data set (I didn't originally use pgbench, but
>>>> you can get the same results with it):
>>>>
>>>> createdb -p 5530 pgbench
>>>> pgbench -p 5530 -i -s 100 pgbench
>>>>
>>>> And delete some stuff:
>>>>
>>>> thom(at)swift:~/Development/test$ psql -p 5530 pgbench
>>>> Timing is on.
>>>> psql (9.6devel)
>>>> Type "help" for help.
>>>>
>>>>
>>>> ➤ psql://thom(at)[local]:5530/pgbench
>>>>
>>>> # DELETE FROM pgbench_accounts WHERE aid % 3 != 0;
>>>> WARNING: out of shared memory
>>>> WARNING: out of shared memory
>>>> WARNING: out of shared memory
>>>> WARNING: out of shared memory
>>>> WARNING: out of shared memory
>>>> WARNING: out of shared memory
>>>> WARNING: out of shared memory
>>>> ...
>>>> WARNING: out of shared memory
>>>> WARNING: out of shared memory
>>>> DELETE 6666667
>>>> Time: 22218.804 ms
>>>>
>>>> There were 358 lines of that warning message. I don't get these
>>>> messages without the patch.
>>>>
>>>> Thom
>>>
>>> Thank you for this report.
>>> I tried to reproduce it, but I couldn't. Debug will be much easier now.
>>>
>>> I hope I'll fix these issueswithin the next few days.
>>>
>>> BTW, I found a dummy mistake, the previous patch contains some unrelated
>>> changes. I fixed it in the new version (attached).
>>
>> Thanks. Well I've tested this latest patch, and the warnings are no
>> longer generated. However, the index sizes show that the patch
>> doesn't seem to be doing its job, so I'm wondering if you removed too
>> much from it.
>
>
> Huh, this patch seems to be enchanted) It works fine for me. Did you perform
> "make distclean"?

Yes. Just tried it again:

git clean -fd
git stash
make distclean
patch -p1 < ~/Downloads/btree_compression_2.0.patch
../dopg.sh (script I've always used to build with)
pg_ctl start
createdb pgbench
pgbench -i -s 100 pgbench

$ psql pgbench
Timing is on.
psql (9.6devel)
Type "help" for help.

➤ psql://thom(at)[local]:5488/pgbench

# \di+
List of relations
Schema | Name | Type | Owner | Table |
Size | Description
--------+-----------------------+-------+-------+------------------+--------+-------------
public | pgbench_accounts_pkey | index | thom | pgbench_accounts | 214 MB |
public | pgbench_branches_pkey | index | thom | pgbench_branches | 24 kB |
public | pgbench_tellers_pkey | index | thom | pgbench_tellers | 48 kB |
(3 rows)

Previously, this would show an index size of 87MB for pgbench_accounts_pkey.

> Anyway, I'll send a new version soon.
> I just write here to say that I do not disappear and I do remember about the
> issue.
> I even almost fixed the insert speed problem. But I'm very very busy this
> week.
> I'll send an updated patch next week as soon as possible.

Thanks.

> Thank you for attention to this work.

Thanks for your awesome patches.

Thom

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2016-02-02 11:59:31 Re: Freeze avoidance of very large table.
Previous Message Robert Haas 2016-02-02 11:54:28 Re: [PATCH] Refactoring of LWLock tranches