Re: WIP: Fast GiST index build

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Fast GiST index build
Date: 2011-08-11 07:35:46
Message-ID: CAPpHfdsE-=Zv8fN1g3x1bAyUHZnn-iTRd6mO_e12xmr1WxhYWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 11, 2011 at 10:21 AM, Heikki Linnakangas <
heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> Split of an internal node works like this:
>
> 1. Gather all the existing tuples on the page, plus the new tuple being
> inserted.
> 2. Call picksplit on the tuples, to divide them into pages
> 3. Go through all tuples on the buffer associated with the page, and divide
> them into buffers on the new pages. This is done by calling penalty function
> on each buffered tuple.
>
> I wonder if it would be better for index quality to pass the buffered
> tuples to picksplit in the 2nd step, so that they too can affect the split
> decision. Maybe it doesn't make much difference in practice..

I had this idea. But:
1) Buffer contain much more tuples than page plus new tuple.
2) Picksplit method can easily be quadratic for example.

Let's see the complexity of picksplit algorithms:
1) geometric datatypes (point, box etc) - O(n) (BTW, I have serious doubts
about it, i.e. O(n*log(n)) algorithm can be in times better in many cases)
2) pg_trgm and fts - O(n^2)
3) seg - O(n*log(n))
4) cube - O(n^2)

Thus, I believe such feature should be an optional. We can try it as add-on
patch.

------
With best regards,
Alexander Korotkov.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2011-08-11 08:08:31 Re: sha1, sha2 functions into core?
Previous Message Peter Eisentraut 2011-08-11 07:16:42 Re: XMLATTRIBUTES vs. values of type XML