Re: Zedstore - compressed in-core columnar storage

From: Ashwin Agrawal <aagrawal(at)pivotal(dot)io>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Zedstore - compressed in-core columnar storage
Date: 2019-04-15 17:50:21
Message-ID: CALfoeitUdo62e-HTYqBxeBb+d0hW4+uXWggdWh5t_pbYTcE9Ng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 15, 2019 at 10:33 AM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> On Mon, Apr 15, 2019 at 09:29:37AM -0700, Ashwin Agrawal wrote:
> > On Sun, Apr 14, 2019 at 9:40 AM Tomas Vondra
> > <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >
> > On Thu, Apr 11, 2019 at 06:20:47PM +0300, Heikki Linnakangas wrote:
> > >On 11/04/2019 17:54, Tom Lane wrote:
> > >>Ashwin Agrawal <aagrawal(at)pivotal(dot)io> writes:
> > >>>Thank you for trying it out. Yes, noticed for certain patterns
> > pg_lzcompress() actually requires much larger output buffers. Like
> for
> > one 86 len source it required 2296 len output buffer. Current
> zedstore
> > code doesn’t handle this case and errors out. LZ4 for same patterns
> > works fine, would highly recommend using LZ4 only, as anyways speed
> is
> > very fast as well with it.
> > >>
> > >>You realize of course that *every* compression method has some
> inputs
> > that
> > >>it makes bigger. If your code assumes that compression always
> > produces a
> > >>smaller string, that's a bug in your code, not the compression
> > algorithm.
> > >
> > >Of course. The code is not making that assumption, although clearly
> > >there is a bug there somewhere because it throws that error. It's
> > >early days..
> > >
> > >In practice it's easy to weasel out of that, by storing the data
> > >uncompressed, if compression would make it longer. Then you need an
> > >extra flag somewhere to indicate whether it's compressed or not. It
> > >doesn't break the theoretical limit because the actual stored length
> > >is then original length + 1 bit, but it's usually not hard to find a
> > >place for one extra bit.
> > >
> >
> > Don't we already have that flag, though? I see ZSCompressedBtreeItem
> has
> > t_flags, and there's ZSBT_COMPRESSED, but maybe it's more
> complicated.
> >
> > The flag ZSBT_COMPRESSED differentiates between container (compressed)
> > item and plain (uncompressed item). Current code is writtten such that
> > within container (compressed) item, all the data is compressed. If need
> > exists to store some part of uncompressed data inside container item,
> then
> > this additional flag would be required to indicate the same. Hence its
> > different than ZSBT_COMPRESSED. I am thinking one of the ways could be
> to
> > just not store this datum in container item if can't be compressed and
> > just store it as plain item with uncompressed data, this additional
> flag
> > won't be required. Will know more once write code for this.
>
> I see. Perhaps it'd be better to call the flag ZSBT_CONTAINER, when it
> means "this is a container". And then have another flag to track whether
> the container is compressed or not. But as I suggested elsewhere in this
> thread, I think it might be better to store some ID of the compression
> algorithm used instead of a simple flag.
>
> FWIW when I had to deal with incremental compression (adding data into
> already compressed buffers), which is what seems to be happening here, I
> found it very useful/efficient to allow partially compressed buffers and
> only trigger recompressin when absolutely needed.
>
> Applied to this case, the container would first store compressed chunk,
> followed by raw (uncompressed) data. Say, like this:
>
> ZSContainerData {
>
> // header etc.
>
> int nbytes; /* total bytes in data */
> int ncompressed; /* ncompressed <= nbytes, fully compressed when
> * (ncompressed == nbytes) */
>
> char data[FLEXIBLE_ARRAY_MEMBER];
> }
>
> When adding a value to the buffer, it'd be simply appended to the data
> array. When the container would grow too much (can't fit on the page or
> something), recompression is triggered.
>

I think what you suggested here is exactly how its handled currently, just
the mechanics are little different. Plain items are added to page as
insertions are performed. Then when page becomes full, compression is
triggerred container item is created for them to store the compressed data.
Then new insertions are stored as plain items, once again when page becomes
full, they are compressed and container item created for it. So, never,
compressed data is attempted to be compressed again. So, on page plain
items are acting as data section you mentioned above. A page can have mix
of n plain and n container items.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-04-15 17:55:00 Re: Zedstore - compressed in-core columnar storage
Previous Message Robert Haas 2019-04-15 17:47:16 Re: Minor fix in reloptions.c comments