Re: Are there plans to add data compression feature to postgresql?

From: Ivan Sergio Borgonovo <mail(at)webthatworks(dot)it>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Are there plans to add data compression feature to postgresql?
Date: 2008-10-31 14:12:06
Message-ID: 20081031151206.4bc99c4a@dawn.webthatworks.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 31 Oct 2008 08:49:56 +0000
Gregory Stark <stark(at)enterprisedb(dot)com> wrote:

> "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com> writes:
>
> > What is the torn page problem? Note I'm no big fan of
> > compressed file systems, but I can't imagine them not working
> > with databases, as I've seen them work quite reliably under
> > exhange server running a db oriented storage subsystem. And I
> > can't imagine them not being invisible to an application,
> > otherwise you'd just be asking for trouble.

> Invisible under normal operation sure, but when something fails the
> consequences will surely be different and I can't see how you
> could make a compressed filesystem safe without a huge performance
> hit.

Pardon my naiveness but I can't get why compression and data
integrity should be always considered clashing factors.

DB operation are supposed to be atomic if fsync actually does what
it is supposed to do.
So you'd have coherency assured by proper execution of "fsync" going
down to all HW levels before it reach permanent storage.

Now suppose your problem is "avoiding to lose data" not avoiding to
lose coherency.
eg. you're having a very fast stream of data coming from the LHC.
The faster you write to the disk the lower the chances to lose data
in case you incur in some kind of hardware failure during the write.

The fact you're choosing data compression or not depends on which
kind of failure you think is more probable on your hardware and
associated costs.

If you expect gamma rays cooking your SCSI cables or an asteroid
splashing your UPS, compression may be a good choice... it will make
your data reach your permanent storage faster.
If you expect your permanent storage to store data in a not reliable
way (and not report back) a loss of 1 sector may correspond to larger
loss of data.

Another thing I think should be put in the equation of understanding
where is your risk of data loss would be to factor in if your "data
source" has some form of "data persistence".
If it has you could introduce one more layer of "fsyncing", that
means, your data source is not going to wipe the original copy till
your DB report back that everything went fine (no asteroid etc...).
etc... so data compression may be just one more tool to manage your
budget for asteroid shelters.

An annoyance of compression may be that while compression *on
average* may let you put data faster on permanent storage it
increase uncertainty on instant speed of transfer, especially if fs
level and db level compression are not aware of each other and fs
level compression is less aware of the data that is worth to
compress.
If I had to push more for data compression I'd make it data-type
aware and switchable (or auto-switchable based on ANALYZE or stats
results).

Of course if you expect to have faulty "permanent storage", data
compression *may* not be a good bet... but still it depends on
hardware cost, rate of compression, specific kind of failure...
eg. the more you compress the more RAID becomes cheaper...

I understand Tom that DBA are paid to be paranoid and I really
really really appreciate data stored in a format that doesn't require
a long queue of tools to be read. I do really hate dependencies that
translates in hours of *boring* work if something turn bad.

BTW I gave a glance to MonetDB papers posted earlier and it seems
that compression algorithms are strongly read-only search optimised.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Sam Mason 2008-10-31 14:12:38 Re: Equivalent for AUTOINCREMENT?
Previous Message Alvaro Herrera 2008-10-31 14:01:43 Re: Are there plans to add data compression feature to postgresql?