Re: [REVIEW] Re: Compression of full-page-writes

From: "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>
To: Arthur Silva <arthurprs(at)gmail(dot)com>
Cc: Rahila Syed <rahilasyed90(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Rahila Syed <rahilasyed(dot)90(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>
Subject: Re: [REVIEW] Re: Compression of full-page-writes
Date: 2014-09-02 13:37:42
Message-ID: 20140902133742.GN11672@aart.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 02, 2014 at 10:30:11AM -0300, Arthur Silva wrote:
> On Tue, Sep 2, 2014 at 9:11 AM, Rahila Syed <rahilasyed90(at)gmail(dot)com> wrote:
>
> > Hello,
> >
> > >It'd be interesting to check avg cpu usage as well
> >
> > I have collected average CPU utilization numbers by collecting sar output
> > at interval of 10 seconds for following benchmark:
> >
> > Server specifications:
> > Processors:Intel® Xeon ® Processor E5-2650 (2 GHz, 8C/16T, 20 MB) * 2 nos
> > RAM: 32GB
> > Disk : HDD 450GB 10K Hot Plug 2.5-inch SAS HDD * 8 nos
> > 1 x 450 GB SAS HDD, 2.5-inch, 6Gb/s, 10,000 rpm
> >
> > Benchmark:
> >
> > Scale : 16
> > Command :java JR /home/postgres/jdbcrunner-1.2/scripts/tpcc.js
> > -sleepTime 550,250,250,200,200
> >
> > Warmup time : 1 sec
> > Measurement time : 900 sec
> > Number of tx types : 5
> > Number of agents : 16
> > Connection pool size : 16
> > Statement cache size : 40
> > Auto commit : false
> >
> >
> > Checkpoint segments:1024
> > Checkpoint timeout:5 mins
> >
> >
> > Average % of CPU utilization at user level for multiple blocks compression:
> >
> > Compression Off = 3.34133
> >
> > Snappy = 3.41044
> >
> > LZ4 = 3.59556
> >
> > Pglz = 3.66422
> >
> >
> > The numbers show the average CPU utilization is in the following order
> > pglz > LZ4 > Snappy > No compression
> > Attached is the graph which gives plot of % CPU utilization versus time
> > elapsed for each of the compression algorithms.
> > Also, the overall CPU utilization during tests is very low i.e below 10% .
> > CPU remained idle for large(~90) percentage of time. I will repeat the
> > above tests with high load on CPU and using the benchmark given by
> > Fujii-san and post the results.
> >
> >
> > Thank you,
> >
> >
> >
> > On Wed, Aug 27, 2014 at 9:16 PM, Arthur Silva <arthurprs(at)gmail(dot)com> wrote:
> >
> >>
> >> Em 26/08/2014 09:16, "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com> escreveu:
> >>
> >> >
> >> > On Tue, Aug 19, 2014 at 6:37 PM, Rahila Syed <rahilasyed90(at)gmail(dot)com>
> >> wrote:
> >> > > Hello,
> >> > > Thank you for comments.
> >> > >
> >> > >>Could you tell me where the patch for "single block in one run" is?
> >> > > Please find attached patch for single block compression in one run.
> >> >
> >> > Thanks! I ran the benchmark using pgbench and compared the results.
> >> > I'd like to share the results.
> >> >
> >> > [RESULT]
> >> > Amount of WAL generated during the benchmark. Unit is MB.
> >> >
> >> > Multiple Single
> >> > off 202.0 201.5
> >> > on 6051.0 6053.0
> >> > pglz 3543.0 3567.0
> >> > lz4 3344.0 3485.0
> >> > snappy 3354.0 3449.5
> >> >
> >> > Latency average during the benchmark. Unit is ms.
> >> >
> >> > Multiple Single
> >> > off 19.1 19.0
> >> > on 55.3 57.3
> >> > pglz 45.0 45.9
> >> > lz4 44.2 44.7
> >> > snappy 43.4 43.3
> >> >
> >> > These results show that FPW compression is really helpful for decreasing
> >> > the WAL volume and improving the performance.
> >> >
> >> > The compression ratio by lz4 or snappy is better than that by pglz. But
> >> > it's difficult to conclude which lz4 or snappy is best, according to
> >> these
> >> > results.
> >> >
> >> > ISTM that compression-of-multiple-pages-at-a-time approach can compress
> >> > WAL more than compression-of-single-... does.
> >> >
> >> > [HOW TO BENCHMARK]
> >> > Create pgbench database with scall factor 1000.
> >> >
> >> > Change the data type of the column "filler" on each pgbench table
> >> > from CHAR(n) to TEXT, and fill the data with the result of pgcrypto's
> >> > gen_random_uuid() in order to avoid empty column, e.g.,
> >> >
> >> > alter table pgbench_accounts alter column filler type text using
> >> > gen_random_uuid()::text
> >> >
> >> > After creating the test database, run the pgbench as follows. The
> >> > number of transactions executed during benchmark is almost same
> >> > between each benchmark because -R option is used.
> >> >
> >> > pgbench -c 64 -j 64 -r -R 400 -T 900 -M prepared
> >> >
> >> > checkpoint_timeout is 5min, so it's expected that checkpoint was
> >> > executed at least two times during the benchmark.
> >> >
> >> > Regards,
> >> >
> >> > --
> >> > Fujii Masao
> >> >
> >> >
> >> > --
> >> > Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> >> > To make changes to your subscription:
> >> > http://www.postgresql.org/mailpref/pgsql-hackers
> >>
> >> It'd be interesting to check avg cpu usage as well.
> >>
> >
> >
> Is there any reason to default to LZ4-HC? Shouldn't we try the default as
> well? LZ4-default is known for its near realtime speeds in exchange for a
> few % of compression, which sounds optimal for this use case.
>
> Also, we might want to compile these libraries with -O3 instead of the
> default -O2. They're finely tuned to work with all possible compiler
> optimizations w/ hints and other tricks, this is specially true for LZ4,
> not sure for snappy.
>
> In my virtual machine LZ4 w/ -O3 compression runs at twice the speed
> (950MB/s) of -O2 (450MB/s) @ (61.79%), LZ4-HC seems unaffected though
> (58MB/s) @ (60.27%).
>
> Yes, that's right, almost 1GB/s! And the compression ratio is only 1,5%
> short compared to LZ4-HC.

Hi,

I agree completely. For day-to-day use we should use LZ4-default. For read-only
tables, it might be nice to "archive" them with LZ4-HC for the higher compression
would increase read speed and reduce storage space needs. I believe that LZ4-HC
is only slower to compress and the decompression is unaffected.

Regards,
Ken

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-09-02 13:39:35 Re: [REVIEW] Re: Compression of full-page-writes
Previous Message Joel Jacobson 2014-09-02 13:36:34 Re: PL/pgSQL 2