| From: | Pierre Frédéric Caillaud <lists(at)peufeu(dot)com> | 
|---|---|
| To: | "Sam Mason" <sam(at)samason(dot)me(dot)uk>, pgsql-hackers(at)postgresql(dot)org | 
| Subject: | Re: Table and Index compression | 
| Date: | 2009-08-11 10:05:39 | 
| Message-ID: | op.uyhszpgkcke6l8@soyouz | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
	Well, here is the patch. I've included a README, which I paste here.
	If someone wants to play with it (after the CommitFest...) feel free to  
do so.
	While it was an interesting thing to try, I don't think it has enough  
potential to justify more effort...
* How to test
- apply the patch
- copy minilzo.c and minilzo.h to
src/backend/storage/smgr
- configure & make
- enjoy
* How it works
- pg block size set to 32K
- an extra field is added in the header telling the compressed length
THIS IS BAD, this information should be stored in a separate fork of the  
relation, because
	- it would then be backwards compatible
	- the number of bytes to read from a compressed page would be known in  
advance
- the table file is sparse
- the page header is not compressed
- pages are written at their normal positions, but only the compressed  
bytes are written
- if compression gains nothing, un-compressed page is stored
- the filesystem doesn't store the un-written blocks
* Benefits
- Sparse file holes are not cached, so OS disk cache efficiency is at  
least x2
- Random access is faster, having a better probability to hit cache  
(sometimes a bit faster, sometimes it's spectatular)
- Yes, it does save space (> 50%)
* Problems
- Biggest problem : any write to a table that writes data that compresses  
less than whatever was there before can fail on a disk full error.
- ext3 sparse file handling isn't as fast as I wish it would be : on seq  
scans, even if it reads 2x less data, and decompresses very fast, it's  
still slower...
- many seq scans (especially with aggregates) are CPU bound anyway
- therefore, some kind of background-reader-decompressor would be needed
- pre-allocation has to be done to avoid extreme fragmentation of the  
file, which kind of defeats the purpose
- it still causes fragmentation
* Conclusion (for now)
It was a nice thing to try, but I believe it would be better if this was  
implemented directly in the filesystem, on the condition that it should be  
implemented well (ie not like NTFS compression).
| Attachment | Content-Type | Size | 
|---|---|---|
| pg_8.4.0_compression_patch_v001.tar.gz | application/x-gzip | 26.9 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Sam Mason | 2009-08-11 10:35:32 | Re: Patch for 8.5, transformationHook | 
| Previous Message | Peter Eisentraut | 2009-08-11 09:42:36 | Re: Shipping documentation untarred |