Re: Zedstore - compressed in-core columnar storage

From: Alexandra Wang <lewang(at)pivotal(dot)io>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Ashwin Agrawal <aagrawal(at)pivotal(dot)io>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Zedstore - compressed in-core columnar storage
Date: 2019-08-19 23:15:30
Message-ID: CACiyaSr3EEMR=wjdhf9XZiBuOgB0bqdwjPyS5Yh63d-fpACBPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Aug 18, 2019 at 12:35 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

>
> . I was missing a way to check for compression ratio;

Here are the ways to check compression ratio for zedstore:

Table level:
select sum(uncompressedsz::numeric) / sum(totalsz) as compratio from
pg_zs_btree_pages(<tablename>);

Per column level:
select attno, count(*), sum(uncompressedsz::numeric) / sum(totalsz) as
compratio from pg_zs_btree_pages(<tablename>) group by attno order by attno;

> it looks like zedstore
> with lz4 gets ~4.6x for our largest customer's largest table. zfs using
> compress=gzip-1 gives 6x compression across all their partitioned
> tables,
> and I'm surprised it beats zedstore .
>

What kind of tables did you use? Is it possible to give us the schema
of the table? Did you perform 'INSERT INTO ... SELECT' or COPY?
Currently COPY give better compression ratios than single INSERT
because it generates less pages for meta data. Using the above per column
level compression ratio will provide which columns have lower
compression ratio.

We plan to add other compression algorithms like RLE and delta
encoding which should give better compression ratios for column store
along with LZ4.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2019-08-20 00:52:05 Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Previous Message Thomas Munro 2019-08-19 22:53:07 Re: PANIC: could not flush dirty data: Operation not permitted power8, Redhat Centos