From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | obartunov(at)gmail(dot)com |
Cc: | Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: GIN improvements part 1: additional information |
Date: | 2013-12-20 15:59:06 |
Message-ID: | 52B4694A.3060201@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12/19/2013 10:44 AM, Heikki Linnakangas wrote:
> On 12/19/2013 08:37 AM, Oleg Bartunov wrote:
>> Guys,
>>
>> before digging deep into the art of comp/decomp world I'd like to know
>> if you familiar with results of
>> http://wwwconference.org/www2008/papers/pdf/p387-zhangA.pdf paper and
>> some newer research ?
>
> Yeah, I saw that paper.
>
>> Do we agree in what we really want ? Basically,
>> there are three main features: size, compression, decompression speed
>> - we should take two :)
>
> According to that Zhang et al paper you linked, the Vbyte method
> actually performs the worst on all of those measures. The other
> algorithms are quite similar in terms of size (PForDelta being the most
> efficient), while PForDelta is significantly faster to compress/decompress.
>
> Just by looking at those numbers, PForDelta looks like a clear winner.
> However, it operates on much bigger batches than the other algorithms; I
> haven't looked at it in detail, but Zhang et al used 128 integer
> batches, and they say that 32 integers is the minimum batch size. If we
> want to use it for the inline posting lists stored in entry tuples, that
> would be quite wasteful if there are only a few item pointers on the tuple.
>
> Also, in the tests I've run, the compression/decompression speed is not
> a significant factor in total performance, with either varbyte encoding
> or Simple9-like encoding I hacked together.
One disadvantage of Simple9 and other encodings that operate in batches
is that removing a value from the middle can increase the number of
bytes required for the remaining values. That's a problem during vacuum;
it's possible that after vacuuming away one item pointer, the remaining
items no longer fit on the page.
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Oskari Saarenmaa | 2013-12-20 17:01:37 | [PATCH] Make various variables read-only (const) |
Previous Message | Adrian Klaver | 2013-12-20 15:54:25 | Re: pg_upgrade & tablespaces |