Re: Improve compression speeds in pg_lzcompress.c

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Takeshi Yamamuro <yamamuro(dot)takeshi(at)lab(dot)ntt(dot)co(dot)jp>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve compression speeds in pg_lzcompress.c
Date: 2013-01-08 22:14:20
Message-ID: CA+TgmoaHNNChKioSSt1huAKb9AN3GZ1njGfgNS85QGCyjxXXOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 8, 2013 at 9:51 AM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> On Tue, Jan 8, 2013 at 10:20 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Tue, Jan 8, 2013 at 4:04 AM, Takeshi Yamamuro
>> <yamamuro(dot)takeshi(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> Apart from my patch, what I care is that the current one might
>>> be much slow against I/O. For example, when compressing
>>> and writing large values, compressing data (20-40MiB/s) might be
>>> a dragger against writing data in disks (50-80MiB/s). Moreover,
>>> IMHO modern (and very fast) I/O subsystems such as SSD make a
>>> bigger issue in this case.
>>
>> What about just turning compression off?
>
> I've been relying on compression for some big serialized blob fields
> for some time now. I bet I'm not alone, lots of people save serialized
> data to text fields. So rather than removing it, I'd just change the
> default to off (if that was the decision).
>
> However, it might be best to evaluate some of the modern fast
> compression schemes like snappy/lz4 (250MB/s per core sounds pretty
> good), and implement pluggable compression schemes instead. Snappy
> wasn't designed for nothing, it was most likely because it was
> necessary. Cassandra (just to name a system I'm familiar with) started
> without compression, and then it was deemed necessary to the point
> they invested considerable time into it. I've always found the fact
> that pg does compression of toast tables quite forward-thinking, and
> I'd say the feature has to remain there, extended and modernized,
> maybe off by default, but there.

I'm not offering any opinion on whether we should have compression as
a general matter. Maybe yes, maybe no, but my question was about the
OP's use case. If he's willing to accept less efficient compression
in order to get faster compression, perhaps he should just not use
compression at all.

Personally, my biggest gripe about the way we do compression is that
it's easy to detoast the same object lots of times. More generally,
our in-memory representation of user data values is pretty much a
mirror of our on-disk representation, even when that leads to excess
conversions. Beyond what we do for TOAST, there's stuff like numeric
where not only toast but then post-process the results into yet
another internal form before performing any calculations - and then of
course we have to convert back before returning from the calculation
functions. And for things like XML, JSON, and hstore we have to
repeatedly parse the string, every time someone wants to do anything
to do. Of course, solving this is a very hard problem, and not
solving it isn't a reason not to have more compression options - but
more compression options will not solve the problems that I personally
have in this area, by and large.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2013-01-08 22:16:42 Re: Index build temp files
Previous Message Pavel Stehule 2013-01-08 22:12:21 Re: Re: Proposal: Store "timestamptz" of database creation on "pg_database"