Gregory Stark wrote:
> Mark Mielke <mark(at)mark(dot)mielke(dot)cc> writes:
>> It seems to me that transparent file system compression doesn't have limits
>> like "files must be less than 1 Mbyte to be compressed". They don't exhibit
>> poor file system performance.
> Well I imagine those implementations are more complex than toast is. I'm not
> sure what lessons we can learn from their behaviour directly.
>> I remember back in the 386/486 days, that I would always DriveSpace compress
>> everything, because hard disks were so slow then that DriveSpace would
>> actually increase performance.
> Surely this depends on whether your machine was cpu starved or disk starved?
> Do you happen to recall which camp these anecdotal machines from 1980 fell in?
I agree. I'm sure it was disk I/O starved - and maybe not just the disk.
The motherboard might have contributed. :-)
My production machine in 2008/2009 for my uses still seems I/O bound.
The main database server I use is 2 x Intel Xeon 3.0 Ghz (dual-core) = 4
cores, and the uptime load average for the whole system is currently
0.10. The database and web server use their own 4 drives with RAID 10
(main system is on two other drives). Yes, I could always upgrade to a
fancy/larger RAID array, SAS, 15k RPM drives, etc. but if a PostgreSQL
tweak were to give me 30% more performance at a 15% CPU cost... I think
that would be a great alternative option. :-)
Memory may also play a part. My server at home has 4Mbytes of L2 cache
and 4Gbytes of RAM running with 5-5-5-18 DDR2 at 1000Mhz. At these
speeds, my realized bandwidth for RAM is 6.0+ Gbyte/s. My L1/L2 operate
at 10.0+ Gbyte/s. Compression doesn't run that fast, so at least for me,
the benefit of having something in L1/L2 cache vs RAM isn't great,
however, my disks in the RAID10 configuraiton only read/write at
~150Mbyte/s sustained, and much less if seeking is required. Compressing
the data means 30% more data may fit into RAM or 30% increase in data
read from disk, as I assume many compression algorithms can beat 150
Is my configuration typical? It's probably becoming more so. Certainly
more common than the 10+ disk hardware RAID configurations.
>> The toast tables already give a sort of block-addressable scheme.
>> Compression can be on a per block or per set of blocks basis allowing for
>> seek into the block,
> The current toast architecture is that we compress the whole datum, then store
> the datum either inline or using the same external blocking mechanism that we
> use when not compressing. So this doesn't fit at all.
> It does seem like an interesting idea to have toast chunks which are
> compressed individually. So each chunk could be, say, an 8kb chunk of
> plaintext and stored as whatever size it ends up being after compression. That
> would allow us to do random access into external chunks as well as allow
> overlaying the cpu costs of decompression with the i/o costs. It would get a
> lower compression ratio than compressing the whole object together but we
> would have to experiment to see how big a problem that was.
> It would be pretty much rewriting the toast mechanism for external compressed
> data though. Currently the storage and the compression are handled separately.
> This would tie the two together in a separate code path.
> Hm, It occurs to me we could almost use the existing code. Just store it as a
> regular uncompressed external datum but allow the toaster to operate on the
> data column (which it's normally not allowed to) to compress it, but not store
> it externally.
Yeah - sounds like it could be messy.
>> or if compression doesn't seem to be working for the first few blocks, the
>> later blocks can be stored uncompressed? Or is that too complicated compared
>> to what we have now? :-)
> Actually we do that now, it was part of the same patch we're discussing.
Mark Mielke <mark(at)mielke(dot)cc>
In response to
pgsql-hackers by date
|Next:||From: Andrew Chernow||Date: 2009-01-06 02:00:57|
|Subject: Re: QuickLZ compression algorithm (Re: Inclusion in the
PostgreSQL backend for toasting rows)|
|Previous:||From: Tom Lane||Date: 2009-01-06 01:24:10|
|Subject: Re: Function with defval returns wrong result |