Re: Compression of full-page-writes

From: KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Compression of full-page-writes
Date: 2013-10-21 11:10:50
Message-ID: 52650BBA.2050403@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2013/10/19 14:58), Amit Kapila wrote:
> On Tue, Oct 15, 2013 at 11:41 AM, KONDO Mitsumasa
> <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> I think in general also snappy is mostly preferred for it's low CPU
> usage not for compression, but overall my vote is also for snappy.
I think low CPU usage is the best important factor in WAL compression.
It is because WAL write is sequencial write, so few compression ratio improvement
cannot change PostgreSQL's performance, and furthermore raid card with writeback
feature. Furthermore PG executes programs by single proccess, high CPU usage
compression algorithm will cause lessor performance.

>> I found compression algorithm test in HBase. I don't read detail, but it
>> indicates snnapy algorithm gets best performance.
>>
http://blog.erdemagaoglu.com/post/4605524309/lzo-vs-snappy-vs-lzf-vs-zlib-a-comparison-of
>
> The dataset used for performance is quite different from the data
> which we are talking about here (WAL).
> "These are the scores for a data which consist of 700kB rows, each
> containing a binary image data. They probably won’t apply to things
> like numeric or text data."
Yes, you are right. We need testing about compression algorithm in WAL write.

>> I think it is necessary to make best efforts in community than I do the best
>> choice with strict test.
>
> Sure, it is good to make effort to select the best algorithm, but if
> you are combining this patch with inclusion of new compression
> algorithm in PG, it can only make the patch to take much longer time.
I think if our direction is specifically decided, it is easy to make the patch.
Complession patch's direction isn't still become clear, it will be a troublesome
patch which is like sync-rep patch.

> In general, my thinking is that we should prefer compression to reduce
> IO (WAL volume), because reducing WAL volume has other benefits as
> well like sending it to subscriber nodes. I think it will help cases
> where due to less n/w bandwidth, the disk allocated for WAL becomes
> full due to high traffic on master and then users need some
> alternative methods to handle such situations.
Do you talk about archiving WAL file? It can easy to reduce volume that we set
and add compression command with copy command at archive_command.

> I think many users would like to use a method which can reduce WAL
> volume and the users which don't find it enough useful in their
> environments due to decrease in TPS or not significant reduction in
> WAL have the option to disable it.
I favor to select compression algorithm for higher performance. If we need to
compress WAL file more, in spite of lessor performance, we can change archive
copy command with high compression algorithm and add documents that how to
compress archive WAL files at archive_command. Does it wrong? In actual, many of
NoSQLs use snappy for purpose of higher performance.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message KONDO Mitsumasa 2013-10-21 11:17:53 Re: Add min and max execute statement time in pg_stat_statement
Previous Message Peter Eisentraut 2013-10-21 10:46:10 Re: Improve setup for documentation building with FOP