Re: Compression and on-disk sorting

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Andrew Piskorski <atp(at)piskorski(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Greg Stark <gsstark(at)mit(dot)edu>
Subject: Re: Compression and on-disk sorting
Date: 2006-05-17 17:01:11
Message-ID: 87ejysr3fs.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches


Andrew Piskorski <atp(at)piskorski(dot)com> writes:

> Things like enums and 1 bit booleans certainly could be useful, but
> they cannot take advantage of duplicate values across multiple rows at
> all, even if 1000 rows have the exact same value in their "date"
> column and are all in the same disk block, right?

That's an interesting direction to go in. Generic algorithms would still help
in that case since the identical value would occur more frequently than other
values it would be encoded in a smaller symbol. But there's going to be a
limit to how compressed it can get the data.

The ideal way to handle the situation you're describing would be to interleave
the tuples so that you have all 1000 values of the first column, followed by
all 1000 values of the second column and so on. Then you run a generic
algorithm on this and it achieves very high compression rates since there are
a lot of repeating patterns.

I don't see how you build a working database with data in this form however.
For example, a single insert would require updating small pieces of data
across the entire table. Perhaps there's some middle ground with interleaving
the tuples within a single compressed page, or something like that?

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Larry Rosenman 2006-05-17 17:05:56 Re: [GENERAL] Querying libpq compile time options
Previous Message Tom Lane 2006-05-17 16:59:13 Re: [GENERAL] Querying libpq compile time options

Browse pgsql-patches by date

  From Date Subject
Next Message Simon Riggs 2006-05-17 17:38:47 Re: [PATCH] Compression and on-disk sorting
Previous Message Tom Lane 2006-05-17 16:59:13 Re: [GENERAL] Querying libpq compile time options