Re: Concurrence GiST

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Concurrence GiST
Date: 2004-02-16 09:59:31
Message-ID: 40309483.7080901@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Christopher Kings-Lynne wrote:
> Hey Teodor,
>
> How's this going?
>
> I think you were looking at the same paper I was reading about GiST
> indexes. I found the GiST source code somewhat over my head, however.
>
> I hope you'll still working on it and haven't given up!
>

I hoped begining of year will be quiet, but it's not. Our customers give to us
a lot of work... So I havn't a much time work with GiST. :(

Ok, I suppose that the basic papers is "Access methods for next-generation
database systems" by Marcel Kornaker and "Concurrency and Recovery in
Generalized Search Trees" by Kornaker, C.Mohan and Joseph M. Hellerstein.

But it seems to me it's not enough to us. When I began to work with GiST in
pgsql I found that split operation may fails with variable-size key. Just for
one reason: user-defined method pickSplit doesn't guarantee that size of free
space on new page will be enough for insertion of new key. For example: page
contains small keys which all equals and one - not (small too). We want to
insert a big key, so pickSplit is called. It distribute equals keys to one page
and different - to another and we want insert new key in first page - and we
hasn't enough free space. Contrib/intarray and contrib/tsearch* modules often
produce similar situation. For this reason, in current implementation gistSplit
(gist.c) method is recursive, and more - it splits 'virtual' page with already
inserted new key (look gist.c near 523 line).

As I can see in papers, it's algorithm isn't protected for a such case. So, now
I think on two directions:
1 How to adopt paper's insertion algorithm. But without success now :(
2 More simple algorithm, but with less concurrerncy based on 'update locks'
which described at http://www-db.stanford.edu/~ullman/dscb.html (I don't known
who was fisrt, but I readed about it in book).
Update lock looks as shared lock while asking and as exclusive while deducted.
Matrix of locks:
S X U
S y n y
X n n n
U n n n

So, insertion algorithm with two-phase locking:
Find leaf to insert key (with U locking all parent pages)
Define which parent will be changed ( let I call it P-page :) ).
Update lock to X all pages from P-Page to leaf page.
Release U-locks from root to P-page
Insert and update pages from P-page to leaf
Realese all locks.

So, the defect of this scheme is: nobody can start (but work with other
pages is possible) work with index while insert process locks root even if root
locked only with U lock. And we need to add U lock in lock manager of pgsql.

So, I still thinking. If you has other thought/idea, pls, don't be quiet.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Gibson 2004-02-16 10:50:38 Re: dblink - custom datatypes NOW work :)
Previous Message news.postgresql.org 2004-02-16 03:29:42 CHAR(n) always trims trailing spaces in 7.4