Re: WIP: Fast GiST index build

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Fast GiST index build
Date: 2011-07-25 18:52:16
Message-ID: 4E2DBB60.80604@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22.07.2011 12:38, Alexander Korotkov wrote:
> Patch with my try to detect ordered datasets is attached. The implemented
> idea is desribed below.
> Index tuples are divided by chunks of 128. On each chunk we measure how much
> leaf pages where index tuples was inserted don't match those of previous
> chunk. Based on statistics of several chunks we estimate distribution of
> accesses between lead pages (exponential distribution law is accumed and
> it's seems to be an error). After that we can estimate portion of index
> tuples which can be processed without actual IO. If this estimate exceeds
> threshold then we should switch to buffering build.
> Now my implementation successfully detects randomly mixed datasets and well
> ordered datasets, but it's seems to be too optimistic about intermediate
> cases. I believe it's due to wrong assumption about distribution law.
> Do you think this approach is acceptable? Probably there are some researches
> about distribution law for such cases (while I didn't find anything relevant
> in google scholar)?

Great! It would be nice to find a more scientific approach to this, but
that's probably fine for now. It's time to start cleaning up the patch
for eventual commit.

You got rid of the extra page pins, which is good, but I wonder why you
still pre-create all the GISTLoadedPartItem structs for the whole
subtree in loadTreePart() ? Can't you create those structs on-the-fly,
when you descend the tree? I understand that it's difficult to update
all the parent-pointers as trees are split, but it feels like there's
way too much bookkeeping going on. Surely it's possible to simplify it
somehow..

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexey Klyukin 2011-07-25 18:55:09 Re: proposal: a validator for configuration files
Previous Message Bernd Helmle 2011-07-25 18:37:29 Re: Another issue with invalid XML values