Skip site navigation (1) Skip section navigation (2)

Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Douglas McNaught <doug(at)mcnaught(dot)org>, "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, lar(at)quicklz(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)
Date: 2009-01-06 01:42:01
Message-ID: 4962B6E9.2060703@mark.mielke.cc (view raw or flat)
Thread:
Lists: pgsql-hackers
Gregory Stark wrote:
> Mark Mielke <mark(at)mark(dot)mielke(dot)cc> writes:
>   
>> It seems to me that transparent file system compression doesn't have limits
>> like "files must be less than 1 Mbyte to be compressed". They don't exhibit
>> poor file system performance.
>>     
>
> Well I imagine those implementations are more complex than toast is. I'm not
> sure what lessons we can learn from their behaviour directly.
>   
>> I remember back in the 386/486 days, that I would always DriveSpace compress
>> everything, because hard disks were so slow then that DriveSpace would
>> actually increase performance.
>>     
>
> Surely this depends on whether your machine was cpu starved or disk starved?
> Do you happen to recall which camp these anecdotal machines from 1980 fell in?
>   

I agree. I'm sure it was disk I/O starved - and maybe not just the disk. 
The motherboard might have contributed. :-)

My production machine in 2008/2009 for my uses still seems I/O bound. 
The main database server I use is 2 x Intel Xeon 3.0 Ghz (dual-core) = 4 
cores, and the uptime load average for the whole system is currently 
0.10. The database and web server use their own 4 drives with RAID 10 
(main system is on two other drives). Yes, I could always upgrade to a 
fancy/larger RAID array, SAS, 15k RPM drives, etc. but if a PostgreSQL 
tweak were to give me 30% more performance at a 15% CPU cost... I think 
that would be a great alternative option. :-)

Memory may also play a part. My server at home has 4Mbytes of L2 cache 
and 4Gbytes of RAM running with 5-5-5-18 DDR2 at 1000Mhz. At these 
speeds, my realized bandwidth for RAM is 6.0+ Gbyte/s. My L1/L2 operate 
at 10.0+ Gbyte/s. Compression doesn't run that fast, so at least for me, 
the benefit of having something in L1/L2 cache vs RAM isn't great, 
however, my disks in the RAID10 configuraiton only read/write at 
~150Mbyte/s sustained, and much less if seeking is required. Compressing 
the data means 30% more data may fit into RAM or 30% increase in data 
read from disk, as I assume many compression algorithms can beat 150 
Mbyte/s.

Is my configuration typical? It's probably becoming more so. Certainly 
more common than the 10+ disk hardware RAID configurations.


>> The toast tables already give a sort of block-addressable scheme.
>> Compression can be on a per block or per set of blocks basis allowing for
>> seek into the block,
>>     
>
> The current toast architecture is that we compress the whole datum, then store
> the datum either inline or using the same external blocking mechanism that we
> use when not compressing. So this doesn't fit at all.
> It does seem like an interesting idea to have toast chunks which are
> compressed individually. So each chunk could be, say, an 8kb chunk of
> plaintext and stored as whatever size it ends up being after compression. That
> would allow us to do random access into external chunks as well as allow
> overlaying the cpu costs of decompression with the i/o costs. It would get a
> lower compression ratio than compressing the whole object together but we
> would have to experiment to see how big a problem that was.
>
> It would be pretty much rewriting the toast mechanism for external compressed
> data though. Currently the storage and the compression are handled separately.
> This would tie the two together in a separate code path.
>
> Hm, It occurs to me we could almost use the existing code. Just store it as a
> regular uncompressed external datum but allow the toaster to operate on the
> data column (which it's normally not allowed to) to compress it, but not store
> it externally.
>   
Yeah - sounds like it could be messy.

>> or if compression doesn't seem to be working for the first few blocks, the
>> later blocks can be stored uncompressed? Or is that too complicated compared
>> to what we have now? :-)
>>     
>
> Actually we do that now, it was part of the same patch we're discussing.
>   

Cheers,
mark

-- 
Mark Mielke <mark(at)mielke(dot)cc>

In response to

pgsql-hackers by date

Next:From: Andrew ChernowDate: 2009-01-06 02:00:57
Subject: Re: QuickLZ compression algorithm (Re: Inclusion in the PostgreSQL backend for toasting rows)
Previous:From: Tom LaneDate: 2009-01-06 01:24:10
Subject: Re: Function with defval returns wrong result

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group