Re: [REVIEW] Re: Compression of full-page-writes

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: "Syed, Rahila" <Rahila(dot)Syed(at)nttdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Compression of full-page-writes
Date: 2015-03-10 12:55:12
Message-ID: CAB7nPqTgC=9wzDpoxecitKwUcnDmNa8epMfGaC=U5Vz7b1ZUvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 9, 2015 at 9:08 PM, Michael Paquier wrote:
> On Mon, Mar 9, 2015 at 4:29 PM, Fujii Masao wrote:
>> Thanks for updating the patch! Attached is the refactored version of the patch.

Fujii-san and I had a short chat about tuning a bit the PGLZ strategy
which is now PGLZ_strategy_default in the patch (at least 25% of
compression, etc.). In particular min_input_size which is not set at
32B is too low, and knowing that the minimum fillfactor of a relation
page is 10% this looks really too low.

For example, using the extension attached to this email able to
compress and decompress bytea strings that I have developed after pglz
has been moved to libpqcommon (contains as well a function able to get
a relation page without its hole, feel free to use it), I am seeing
that we can gain quite a lot of space even with some incompressible
data like UUID or some random float data (pages are compressed without
their hole):
1) Float table:
=# create table float_tab (id float);
CREATE TABLE
=# insert into float_tab select random() from generate_series(1, 20);
INSERT 0 20
=# SELECT bytea_size(compress_data(page)) AS compress_size,
bytea_size(page) AS raw_size_no_hole FROM
get_raw_page('float_tab'::regclass, 0, false);
-[ RECORD 1 ]----+----
compress_size | 329
raw_size_no_hole | 744
=# SELECT bytea_size(compress_data(page)) AS compress_size,
bytea_size(page) AS raw_size_no_hole FROM
get_raw_page('float_tab'::regclass, 0, false);
-[ RECORD 1 ]----+-----
compress_size | 1753
raw_size_no_hole | 4344
So that's more or less 60% saved...
2) UUID table
=# SELECT bytea_size(compress_data(page)) AS compress_size,
bytea_size(page) AS raw_size_no_hole FROM
get_raw_page('uuid_tab'::regclass, 0, false);
-[ RECORD 1 ]----+----
compress_size | 590
raw_size_no_hole | 904
=# insert into uuid_tab select gen_random_uuid() from generate_series(1, 100);
INSERT 0 100
=# SELECT bytea_size(compress_data(page)) AS compress_size,
bytea_size(page) AS raw_size_no_hole FROM
get_raw_page('uuid_tab'::regclass, 0, false);
-[ RECORD 1 ]----+-----
compress_size | 3338
raw_size_no_hole | 5304
And in this case we are close to 40% saved...

At least, knowing that with the header there are at least 24B used on
a page, what about increasing min_input_size to something like 128B or
256B? I don't think that this is a blocker for this patch as most of
the relation pages are going to have far more data than that so they
will be unconditionally compressed, but there is definitely something
we could do in this area later on, perhaps even we could do
improvement with the other parameters like the compression rate. So
that's something to keep in mind...
--
Michael

Attachment Content-Type Size
compress_test.tar.gz application/x-gzip 3.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-03-10 13:05:57 Relation ordering in FROM clause causing error related to missing entry... Or not.
Previous Message Robert Haas 2015-03-10 12:52:54 Re: CATUPDATE confusion?