Quick Links

Re: Compression and on-disk sorting

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Andrew Piskorski <atp(at)piskorski(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Compression and on-disk sorting
Date:	2006-05-17 19:30:54
Message-ID:	24764.1147894254@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

Greg Stark <gsstark(at)mit(dot)edu> writes:
> The ideal way to handle the situation you're describing would be to interleave
> the tuples so that you have all 1000 values of the first column, followed by
> all 1000 values of the second column and so on. Then you run a generic
> algorithm on this and it achieves very high compression rates since there are
> a lot of repeating patterns.

It's not obvious to me that that yields a form more compressible than
what we have now. As long as the previous value is within the lookback
window, an LZ-style compressor will still be able to use it. More
importantly, the layout you describe would be unable to take advantage
of any cross-column correlation, which in real data is likely to be a
useful property for compression.

regards, tom lane

In response to

Re: Compression and on-disk sorting at 2006-05-17 17:01:11 from Greg Stark

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jim C. Nasby	2006-05-17 19:50:59	Re: Compression and on-disk sorting
Previous Message	Bruce Momjian	2006-05-17 19:02:10	Re: [GENERAL] Querying libpq compile time options

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Jim C. Nasby	2006-05-17 19:50:59	Re: Compression and on-disk sorting
Previous Message	Simon Riggs	2006-05-17 17:38:47	Re: [PATCH] Compression and on-disk sorting