Re: WIP: Fast GiST index build

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Fast GiST index build
Date: 2011-08-07 19:28:10
Message-ID: CAPpHfdusd_b_dtvNY2uuJict6=VEUTGAS8ELXGck=o=kOdMnOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

There is last version of patch. There is the list of most significant
changes in comparison with your version of patch:
1) Reference counting of path items was added. It should helps against
possible accumulation of path items.
2) Neighbor relocation was added.
3) Subtree prefetching was added.
4) Final emptying algorithm was reverted to the original one. My experiments
shows that typical number of passes in final emptying in your version of
patch is about 5. It may be significant itself. Also I haven't estimate of
number of passes and haven't guarantees that it will not be high in some
corner cases. I.e. I prefer more predictable single-pass algorithm in spite
of it's a little more complex.
5) Unloading even last page of node buffer from main memory to the disk.
Imagine that that with levelstep = 1 each inner node has buffer. It seems to
me that keeping one page of each buffer in memory may be memory consuming.

Open items I see at this moment:
1) I dislike my switching to buffering build method because it's based on
very unproven assumptions. And I didn't more reliable assumptions in
scientific papers while. I would like to replace it with something much
simplier. For example, switching to buffering build when regular build
actually starts to produce a lot of IO. For this approach implementation I
need to somehow detect actual IO (not just buffer read but miss of OS
cache).
2) I'm worrying about possible size of nodeBuffersTab and path items. If we
imagine extremely large tree with levelstep = 1 size of this datastructures
will be singnificant. And it's hard to predict that size without knowing of
tree size.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
gist_fast_build-0.10.0.patch.gz application/x-gzip 24.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Kupershmidt 2011-08-07 20:00:49 Re: vacuumlo patch
Previous Message Tom Lane 2011-08-07 17:47:49 Yes, WaitLatch is vulnerable to weak-memory-ordering bugs