Re: Zedstore - compressed in-core columnar storage

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Alexandra Wang <lewang(at)pivotal(dot)io>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Ashwin Agrawal <aagrawal(at)pivotal(dot)io>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Zedstore - compressed in-core columnar storage
Date: 2019-08-20 02:04:25
Message-ID: 20190820020425.GX11185@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 19, 2019 at 04:15:30PM -0700, Alexandra Wang wrote:
> On Sun, Aug 18, 2019 at 12:35 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> > . I was missing a way to check for compression ratio;
>
> Here are the ways to check compression ratio for zedstore:
>
> Table level:
> SELECT sum(uncompressedsz::numeric)/sum(totalsz) AS compratio FROM pg_zs_btree_pages(<tablename>);

postgres=# SELECT sum(uncompressedsz::numeric)/sum(totalsz) AS compratio FROM pg_zs_btree_pages('child.cdrs_huawei_pgwrecord_2019_07_01');
compratio | 4.2730304163521529

For a fair test, I created a separate ZFS tablspace for storing just a copy of
that table.

ts=# CREATE TABLE test TABLESPACE testcomp AS SELECT * FROM child.cdrs_huawei_pgwrecord_2019_07_01;
SELECT 39933381
Time: 882417.775 ms (14:42.418)

zfs/testJTP20190819 compressratio 6.01x -
zfs/testJTP20190819 compression gzip-1 inherited from zfs

> Per column level:
> select attno, count(*), sum(uncompressedsz::numeric)/sum(totalsz) as compratio from pg_zs_btree_pages(<tablename>) group by attno order by attno;

Order by 3; I see we have SOME highly compressed columns.

It's still surprising to me that's as low as it is, given their content: phone
numbers and IPv4 addresses in text form, using characters limited to
[[:digit:].]

(I realize we can probably save space using inet type.)

0 | 4743 | 1.00000000000000000000
32 | 21912 | 1.05953637381493823513
80 | 36441 | 1.2416446300175039
4 | 45059 | 1.3184106811322728
83 | 45059 | 1.3184106811322728
52 | 39208 | 1.3900788061770992
...
74 | 3464 | 10.8258665101057364
17 | 3535 | 10.8776086243096534
3 | 7092 | 11.0388009154683678
11 | 3518 | 11.4396055611832109
65 | 3333 | 14.6594723104237634
35 | 14077 | 15.1642131499381887
...
43 | 1601 | 21.4200106784573211
79 | 1599 | 21.4487670806076829
89 | 1934 | 23.6292134031933401
33 | 1934 | 23.6292134031933401

It seems clear the columns with high n_distinct have low compress ratio, and
columns with high compress ratio are those with n_distinct=1...

CREATE TEMP TABLE zs AS SELECT zs.*, n_distinct, avg_width, a.attname FROM (SELECT 'child.cdrs_huawei_pgwrecord_2019_07_01'::regclass t)t , LATERAL (SELECT attno, count(*), sum(uncompressedsz::numeric)/sum(totalsz) AS compratio FROM pg_zs_btree_pages(t) GROUP BY attno)zs , pg_attribute a, pg_class c, pg_stats s WHERE a.attrelid=t AND a.attnum=zs.attno AND c.oid=a.attrelid AND c.relname=s.tablename AND s.attname=a.attname;

n_distinct | compratio
------------+------------------------
217141 | 1.2416446300175039
154829 | 1.5306062496764190
144486 | 1.3900788061770992
128334 | 1.5395022739568842
121324 | 1.4005533187886683
86341 | 1.6262709389296389
84073 | 4.4379336418590519
65413 | 5.1890181028038757
63703 | 5.5029855093836425
63637 | 5.3648468796642262
46450 | 1.3184106811322728
46450 | 1.3184106811322728
43029 | 1.8003513772661308
39363 | 1.5845730687475706
36720 | 1.4751147557399539
36445 | 1.8403087513759131
36445 | 1.5453935268318613
11455 | 1.05953637381493823513
2862 | 9.8649823666870671
2625 | 2.3573614181847621
1376 | 1.7895024285340428
1335 | 2.2812551964262787
807 | 7.1192324141359373
610 | 7.9373623460089360
16 | 11.4396055611832109
10 | 5.5429763442365557
7 | 5.0440578041440675
7 | 5.2000132813261135
4 | 6.9741514753325536
4 | 4.2872818036896340
3 | 1.9080838412634827
3 | 2.9915954457453485
3 | 2.3056387009407882
2 | 10.8776086243096534
2 | 5.5950929307378287
2 | 18.5796576388128741
2 | 10.8258665101057364
2 | 9.1112820658021406
2 | 3.4986057630739795
2 | 4.6250999234025238
2 | 11.0388009154683678
1 | 15.1642131499381887
1 | 2.8855860118178798
1 | 23.6292134031933401
1 | 21.4200106784573211
[...]

> > it looks like zedstore
> > with lz4 gets ~4.6x for our largest customer's largest table. zfs using
> > compress=gzip-1 gives 6x compression across all their partitioned
> > tables,
> > and I'm surprised it beats zedstore .
> >
>
> What kind of tables did you use? Is it possible to give us the schema
> of the table? Did you perform 'INSERT INTO ... SELECT' or COPY?

I did this:

|time ~/src/postgresql.bin/bin/pg_restore /srv/cdrperfbackup/ts/final/child.cdrs_huawei_pgwrecord_2019_07_01 -f- |PGOPTIONS='-cdefault_table_access_method=zedstore' psql --port 5678 postgres --host /tmp
...
COPY 39933381
...
real 100m25.764s

child | cdrs_huawei_pgwrecord_2019_07_01 | table | pryzbyj | permanent | 8277 MB |

postgres=# SELECT array_to_string(array_agg(format_type(atttypid, atttypmod) ||CASE WHEN attnotnull THEN ' not null' ELSE '' END ORDER BY attnum),',') FROM pg_attribute WHERE attrelid='child.cdrs_huawei_pgwrecord_2019_07_01'::regclass AND attnum>0;
array_to_string | text not null,text,text not null,text not null,text not null,text,text,text,boolean,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,timestamp without time zone not null,bigint not null,text not null,text,text,text,text,text,text,text,text,text,text not null,text,boolean,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,text,timestamp with time zone,timestamp with time zone,text,text,boolean,text,text,boolean,boolean,text not null,text not null

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-08-20 04:34:45 Re: Cleanup isolation specs from unused steps
Previous Message Michael Paquier 2019-08-20 02:00:21 Re: Cleanup isolation specs from unused steps