Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Douglas McNaught <doug(at)mcnaught(dot)org>, "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, lar(at)quicklz(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)
Date: 2009-01-06 01:42:01
Message-ID: 4962B6E9.2060703@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Gregory Stark wrote:
> Mark Mielke <mark(at)mark(dot)mielke(dot)cc> writes:
>
>> It seems to me that transparent file system compression doesn't have limits
>> like "files must be less than 1 Mbyte to be compressed". They don't exhibit
>> poor file system performance.
>>
>
> Well I imagine those implementations are more complex than toast is. I'm not
> sure what lessons we can learn from their behaviour directly.
>
>> I remember back in the 386/486 days, that I would always DriveSpace compress
>> everything, because hard disks were so slow then that DriveSpace would
>> actually increase performance.
>>
>
> Surely this depends on whether your machine was cpu starved or disk starved?
> Do you happen to recall which camp these anecdotal machines from 1980 fell in?
>

I agree. I'm sure it was disk I/O starved - and maybe not just the disk.
The motherboard might have contributed. :-)

My production machine in 2008/2009 for my uses still seems I/O bound.
The main database server I use is 2 x Intel Xeon 3.0 Ghz (dual-core) = 4
cores, and the uptime load average for the whole system is currently
0.10. The database and web server use their own 4 drives with RAID 10
(main system is on two other drives). Yes, I could always upgrade to a
fancy/larger RAID array, SAS, 15k RPM drives, etc. but if a PostgreSQL
tweak were to give me 30% more performance at a 15% CPU cost... I think
that would be a great alternative option. :-)

Memory may also play a part. My server at home has 4Mbytes of L2 cache
and 4Gbytes of RAM running with 5-5-5-18 DDR2 at 1000Mhz. At these
speeds, my realized bandwidth for RAM is 6.0+ Gbyte/s. My L1/L2 operate
at 10.0+ Gbyte/s. Compression doesn't run that fast, so at least for me,
the benefit of having something in L1/L2 cache vs RAM isn't great,
however, my disks in the RAID10 configuraiton only read/write at
~150Mbyte/s sustained, and much less if seeking is required. Compressing
the data means 30% more data may fit into RAM or 30% increase in data
read from disk, as I assume many compression algorithms can beat 150
Mbyte/s.

Is my configuration typical? It's probably becoming more so. Certainly
more common than the 10+ disk hardware RAID configurations.

>> The toast tables already give a sort of block-addressable scheme.
>> Compression can be on a per block or per set of blocks basis allowing for
>> seek into the block,
>>
>
> The current toast architecture is that we compress the whole datum, then store
> the datum either inline or using the same external blocking mechanism that we
> use when not compressing. So this doesn't fit at all.
> It does seem like an interesting idea to have toast chunks which are
> compressed individually. So each chunk could be, say, an 8kb chunk of
> plaintext and stored as whatever size it ends up being after compression. That
> would allow us to do random access into external chunks as well as allow
> overlaying the cpu costs of decompression with the i/o costs. It would get a
> lower compression ratio than compressing the whole object together but we
> would have to experiment to see how big a problem that was.
>
> It would be pretty much rewriting the toast mechanism for external compressed
> data though. Currently the storage and the compression are handled separately.
> This would tie the two together in a separate code path.
>
> Hm, It occurs to me we could almost use the existing code. Just store it as a
> regular uncompressed external datum but allow the toaster to operate on the
> data column (which it's normally not allowed to) to compress it, but not store
> it externally.
>
Yeah - sounds like it could be messy.

>> or if compression doesn't seem to be working for the first few blocks, the
>> later blocks can be stored uncompressed? Or is that too complicated compared
>> to what we have now? :-)
>>
>
> Actually we do that now, it was part of the same patch we're discussing.
>

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Chernow 2009-01-06 02:00:57 Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)
Previous Message Tom Lane 2009-01-06 01:24:10 Re: Function with defval returns wrong result