Re: Are there plans to add data compression feature to postgresql?

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, 小波 顾 <guxiaobo1982(at)hotmail(dot)com>, Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>, chris(dot)ellis(at)shropshire(dot)gov(dot)uk, "pgsql-general\(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Are there plans to add data compression feature to postgresql?
Date: 2008-10-31 08:49:56
Message-ID: 87fxmd9miz.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


"Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com> writes:

> What is the torn page problem? Note I'm no big fan of compressed file
> systems, but I can't imagine them not working with databases, as I've
> seen them work quite reliably under exhange server running a db
> oriented storage subsystem. And I can't imagine them not being
> invisible to an application, otherwise you'd just be asking for
> trouble.

Invisible under normal operation sure, but when something fails the
consequences will surely be different and I can't see how you could make a
compressed filesystem safe without a huge performance hit.

The torn page problem is what happens if the system loses power or crashes
when only part of the data written has made it to disk. If you're compressing
or encrypting data then you can't expect the old data portion and the new data
portion to make sense together.

So for example if Postgres sets a hint bit on one tuple in a block, then
writes out that block and the filesystem recompresses it, the entire block
will change. If the system crashes when only 4k of it has reached disk then
when we read in that block it will fail decompression.

And if the block size of the compressed filesystem is larger than the
PostgreSQL block size your problems are even more severe. Even a regular
WAL-logged write to a database block can cause the subsequent database block
to become unreadable if power is lost before the entire set of database blocks
within the filesystem block is written.

The only way I could see this working is if you use a filesystem which logs
data changes like ZFS or ext3 with data=journal. Even then you have to be very
careful to make the filesystem block size that the journal treats as atomic
match the Postgres block size or you'll still be in trouble.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Nick Mellor 2008-10-31 09:13:00 Re: Slow query performance
Previous Message Magnus Hagander 2008-10-31 08:22:08 Re: Decreasing WAL size effects