Re: libpq compression

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Euler Taveira <euler(at)timbira(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Claes Jakobsson <claes(at)versed(dot)se>
Subject: Re: libpq compression
Date: 2012-06-25 13:12:46
Message-ID: 73447F47-E9A3-420B-8903-9F6A4513E229@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jun25, 2012, at 04:04 , Robert Haas wrote:
> If, for
> example, someone can demonstrate that an awesomebsdlz compresses 10x
> as fast as OpenSSL... that'd be pretty compelling.

That, actually, is demonstrably the case for at least Google's snappy.
(and LZO, but that's not an option since its license is GPL) They state in
their documentation that

In our tests, Snappy usually is faster than algorithms in the same class
(e.g. LZO, LZF, FastLZ, QuickLZ, etc.) while achieving comparable
compression ratios.

The only widely supported compression method for SSL seems to be DEFLATE,
which is also what gzip/zlib uses. I've benchmarked LZO against gzip/zlib
a few months ago, and LZO outperformed zlib in fast mode (i.e. gzip -1) by
an order of magnitude.

The compression ratio achieved by DEFLATE/gzip/zlib is much better, though.
The snappy documentation states

Typical compression ratios (based on the benchmark suite) are about
1.5-1.7x for plain text, about 2-4x for HTML, and of course 1.0x for
JPEGs, PNGs and other already-compressed data. Similar numbers for zlib
in its fastest mode are 2.6-2.8x, 3-7x and 1.0x, respectively.

Here are a few numbers for LZO vs. gzip. Snappy should be comparable to
LZO - I tested LZO because I still had the command-line compressor lzop
lying around on my machine, whereas I'd have needed to download and compile
snappy first.

$ dd if=/dev/random of=data bs=1m count=128
$ time gzip -1 < data > data.gz
real 0m6.189s
user 0m5.947s
sys 0m0.224s
$ time lzop < data > data.lzo
real 0m2.697s
user 0m0.295s
sys 0m0.224s
$ ls -lh data*
-rw-r--r-- 1 fgp staff 128M Jun 25 14:43 data
-rw-r--r-- 1 fgp staff 128M Jun 25 14:44 data.gz
-rw-r--r-- 1 fgp staff 128M Jun 25 14:44 data.lzo

$ dd if=/dev/zero of=zeros bs=1m count=128
$ time gzip -1 < zeros > zeros.gz
real 0m1.083s
user 0m1.019s
sys 0m0.052s
$ time lzop < zeros > zeros.lzo
real 0m0.186s
user 0m0.123s
sys 0m0.053s
$ ls -lh zeros*
-rw-r--r-- 1 fgp staff 128M Jun 25 14:47 zeros
-rw-r--r-- 1 fgp staff 572K Jun 25 14:47 zeros.gz
-rw-r--r-- 1 fgp staff 598K Jun 25 14:47 zeros.lzo

To summarize, on my 2.66 Ghz Core2 Duo Macbook Pro, LZO compresses about
350MB/s if the data is purely random, and about 800MB/s if the data
compresses extremely well. (Numbers based on user time since that indicates
the CPU time used, and ignores the IO overhead, which is substantial)

IMHO, the only compelling argument (and a very compelling one) to use
SSL compression was that it requires very little code on our side. We've
since discovered that it's not actually that simple, at least if we want
to support compression without authentication or encryption, and don't
want to restrict ourselves to using OpenSSL forever. So unless we give
up at least one of those requirements, the arguments for using
SSL-compression are rather thin, I think.

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message ktm@rice.edu 2012-06-25 13:19:23 Re: libpq compression
Previous Message Kohei KaiGai 2012-06-25 12:49:19 Re: WIP Patch: Selective binary conversion of CSV file foreign tables