Re: Cube Index Size

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Nick Raj <nickrajjain(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cube Index Size
Date: 2011-06-01 12:18:22
Message-ID: BANLkTinRfzBz=ygsO+fckxN5sn62YVQ4qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 1, 2011 at 3:37 PM, Heikki Linnakangas <
heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> My guess is that the picksplit algorithm performs poorly with that data.
> Unfortunately, I have no idea how to improve that.

Current cube picksplit function have no storage utilization guarantees,
while original Guttman's picksplit has them (if one of group size reaches
some threshold, then all other entries go to another group). Also, current
picksplit is mix of Guttman's linear and quadratic algorithms. It picks
seeds quadratically, but distributes entries linearly.
I see following ways of solving picksplit problem for cube:
1) Add storage utilization guarantees to current picksplit. It may cause
increase of overlaps, but should descrease index size.
2) Add storage utilization guarantees to current picksplit and replace
entries distribution algorithm to the quadratic one. Picksplit will take
more time, but it should give more stable and predictable result.
3) I had some experiments with my own picksplit algorithm, which showed
pretty good results on tests which I've run. But current implementation is
dirty and it's require more testing.

------
With best regards,
Alexander Korotkov.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Teodor Sigaev 2011-06-01 12:23:29 vacuum and row type
Previous Message Dave Page 2011-06-01 12:04:26 Re: pg_listener in 9.0