Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andy Colson <andy(at)squeakycode(dot)net>
Cc: Stefan Keller <sfkeller(at)gmail(dot)com>, pgsql-general List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Call for Google Summer of Code (GSoC) 2012: Project ideas?
Date: 2012-03-09 17:06:04
Message-ID: CAHyXU0zdvbnP39p1LJU5T1dPy0ZZLDT0dx3hTfCptCm1goxLzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Mar 9, 2012 at 10:19 AM, Andy Colson <andy(at)squeakycode(dot)net> wrote:
> On 3/9/2012 9:47 AM, Merlin Moncure wrote:
>>
>> On Thu, Mar 8, 2012 at 2:01 PM, Andy Colson<andy(at)squeakycode(dot)net>  wrote:
>>>
>>> I know toast compresses, but I believe its only one row.  page level
>>> would
>>> compress better because there is more data, and it would also decrease
>>> the
>>> amount of IO, so it might speed up disk access.
>>
>>
>> er, but when data is toasted it's spanning pages.  page level
>> compression is a super complicated problem.
>>
>> something that is maybe more attainable on the compression side of
>> things is a userland api for compression -- like pgcrypto is for
>> encryption.  even if it didn't make it into core, it could live on
>> reasonably as a pgfoundry project.
>>
>> merlin
>
>
> Agreed its probably too difficult for a GSoC project.  But userland api
> would still be row level, which, in my opinion is useless.  Consider rows
> from my apache log that I'm dumping to database:

It's useless for what you're trying to do, but it would be useful to
people trying to compress large datums (data, I know) before storage
using algorithms that postgres can't support, like lzo.

> date, url, status
> 2012-3-9 10:15:00, '/index.php?id=4', 202
> 2012-3-9 10:15:01, '/index.php?id=5', 202
> 2012-3-9 10:15:02, '/index.php?id=6', 202
>
> That wont compress at all on a row level.  But it'll compress 99% on a
> "larger" (page/multirow/whatever/?) level.

sure, but you can only get those rates by giving up the segmented view
of the data that postgres requires. your tuples are very small and I
only see compression happening on the userland side by employing
tricks specific to your specific dataset (like employing "char" to map
the status, url mapping, etc).

merlin

In response to

Browse pgsql-general by date

  From Date Subject
Next Message John R Pierce 2012-03-09 17:07:23 Re: How to erase transaction logs on PostgreSQL
Previous Message Bret Stern 2012-03-09 16:53:25 Regarding NOTIFY