Re: Zedstore - compressed in-core columnar storage

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashwin Agrawal <aagrawal(at)pivotal(dot)io>
Cc: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Zedstore - compressed in-core columnar storage
Date: 2019-04-11 15:20:47
Message-ID: d4bfe7b0-0836-06b1-6b03-f5874b5a2b00@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/04/2019 17:54, Tom Lane wrote:
> Ashwin Agrawal <aagrawal(at)pivotal(dot)io> writes:
>> Thank you for trying it out. Yes, noticed for certain patterns pg_lzcompress() actually requires much larger output buffers. Like for one 86 len source it required 2296 len output buffer. Current zedstore code doesn’t handle this case and errors out. LZ4 for same patterns works fine, would highly recommend using LZ4 only, as anyways speed is very fast as well with it.
>
> You realize of course that *every* compression method has some inputs that
> it makes bigger. If your code assumes that compression always produces a
> smaller string, that's a bug in your code, not the compression algorithm.

Of course. The code is not making that assumption, although clearly
there is a bug there somewhere because it throws that error. It's early
days..

In practice it's easy to weasel out of that, by storing the data
uncompressed, if compression would make it longer. Then you need an
extra flag somewhere to indicate whether it's compressed or not. It
doesn't break the theoretical limit because the actual stored length is
then original length + 1 bit, but it's usually not hard to find a place
for one extra bit.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-04-11 15:36:16 Re: pg_dump is broken for partition tablespaces
Previous Message Jeevan Chalke 2019-04-11 15:04:37 cache lookup failed for collation 0