GSoC 2011: Fast GiST index build

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: GSoC 2011: Fast GiST index build
Date: 2011-04-25 20:09:59
Message-ID: BANLkTi=kUBOX2e9TP6BmmPRDso70vn8KEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hackers!

I was happy to know that my proposal "Fast GiST index build" was accepted to
GSoC 2011! Thank you very much for support! Especially thanks to Heikki
Linnakangas for becoming my mentor!

The first question that I would like to discuss is the node buffer storage.
During index build each index page (except leaf) should have several pages
of buffer. So my question is where to store buffers and how to operate with
them? It is somewhat similar to GIN fastupdate buffer, but have differences.
At first, we should take care about many buffers instead of only one. At
second, I belive that we shouldn't take care about concurrency so much,
because algorithm assume to perform relatively huge operations in memory
(entries relocation between several buffers). That require locking of whole
of currently operated buffers. I'm going to store buffers separetely from
index itself, because we should free all of them when index is built.

I found some very simple solution about dealing with varlena keys. The
greatest buffer size and minimal level step are achived when key size is
minimal. Thereby, minimal key size is worst case. Since minimal varlena size
is 4 bytes, we can use it in initial calculations. I'm going to hold on this
assumption in first implementation.

----
With best regards,
Alexander Korotkov.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2011-04-25 20:10:01 Re: SQLERRD and dump of variables
Previous Message Robert Haas 2011-04-25 20:08:46 Re: branching for 9.2devel