Re: Zedstore - compressed in-core columnar storage

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>, Alexandra Wang <lewang(at)pivotal(dot)io>, Taylor Vesely <tvesely(at)pivotal(dot)io>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Ashwin Agrawal <aagrawal(at)pivotal(dot)io>, DEV_OPS <devops(at)ww-it(dot)cn>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Zedstore - compressed in-core columnar storage
Date: 2020-11-16 16:07:29
Message-ID: f090d405-c5ac-835e-fdb3-0c8d9c850012@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 11/16/20 1:59 PM, Merlin Moncure wrote:
> On Thu, Nov 12, 2020 at 4:40 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>> master zedstore/pglz zedstore/lz4
>> -------------------------------------------------
>> copy 1855 68092 2131
>> dump 751 905 811
>>
>> And the size of the lineitem table (as shown by \d+) is:
>>
>> master: 64GB
>> zedstore/pglz: 51GB
>> zedstore/lz4: 20GB
>>
>> It's mostly expected lz4 beats pglz in performance and compression
>> ratio, but this seems a bit too extreme I guess. Per past benchmarks
>> (e.g. [1] and [2]) the difference in compression/decompression time
>> should be maybe 1-2x or something like that, not 35x like here.
>
> I can't speak to the ratio, but in basic backup/restore scenarios pglz
> is absolutely killing me; Performance is just awful; we are cpubound
> in backups throughout the department. Installations defaulting to
> plgz will make this feature show very poorly.
>

Maybe. I'm not disputing that pglz is considerably slower than lz4, but
judging by previous benchmarks I'd expect the compression to be slower
maybe by a factor of ~2x. So the 30x difference is suspicious. Similarly
for the compression ratio - lz4 is great, but it seems strange it's 1/2
the size of pglz. Which is why I'm speculating that something else is
going on.

As for the "plgz will make this feature show very poorly" I think that
depends. I think we may end up with pglz doing pretty well (compared to
heap), but lz4 will probably outperform that. OTOH for various use cases
it may be more efficient to use something else with worse compression
ratio, but allowing execution on compressed data, etc.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-11-16 16:15:11 Re: doc CREATE INDEX
Previous Message Konstantin Knizhnik 2020-11-16 16:07:13 Re: Cache relation sizes?