Re: WIP: Fast GiST index build

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Fast GiST index build
Date: 2011-08-08 10:18:35
Message-ID: CAPpHfdsovwbgs30TojJc_zDsc+0ghapm+-AALfP58UaHiYjr2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 8, 2011 at 1:23 PM, Heikki Linnakangas <
heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>
> 2) Neighbor relocation was added.
>>
>
> Ok. I think we're going to need some sort of a heuristic on when to enable
> neighbor relocation. If I remember the performance tests correctly, it
> improves the quality of the resulting index, but incurs a significant CPU
> overhead.
>
> Actually, looking at the performance numbers on the wiki page again (
> http://wiki.postgresql.org/**wiki/Fast_GiST_index_build_**GSoC_2011<http://wiki.postgresql.org/wiki/Fast_GiST_index_build_GSoC_2011>),
> it looks like neighbor relocation doesn't help very much with the index
> quality - sometimes it even results in a slightly worse index. Based on
> those results, shouldn't we just remove it? Or is there some other data set
> where it helps significantly?

Oh, actually I didn't add some results with neighborrelocation = off. I
would like to rerun some tests with current version of patch.

> 3) Subtree prefetching was added.
>>
>
> I'm inclined to leave out the prefetching code for now. Unless you have
> some performance numbers that show that it's critical for the overall
> performance. But I don't think that was the case, it's just an additional
> optimization for servers with big RAID arrays. So, please separate that into
> an add-on patch. It needs to be performance tests and reviewed separately.

I though that prefetch helps even on separate hard disks by ordering of IOs.

> 4) Final emptying algorithm was reverted to the original one. My
>> experiments shows that typical number of passes in final emptying in
>> your version of patch is about 5. It may be significant itself. Also I
>> haven't estimate of number of passes and haven't guarantees that it will
>> not be high in some corner cases. I.e. I prefer more predictable
>> single-pass algorithm in spite of it's a little more complex.
>>
>
> I was trying to get rid of that complexity during index build. Some extra
> code in the final pass would be easier to understand than extra work that
> needs to be done through the index build. It's not a huge amount of code,
> but still.
>
> I'm not worried about the extra CPU overhead of scanning the data
> structures at the final pass. I guess in my patch you had to do extra I/O as
> well, because the buffers were not emptied in strict top-down order, so
> let's avoid that. How about:
>
> Track all buffers in the lists, not only those that are non-empty. Add the
> buffer to the right list at getNodeBuffer(). That way in the final stage,
> you need to scan through all buffers instead of just the non-empty ones. But
> the overhead of that to should be minimal in practice, scanning some
> in-memory data structures is pretty cheap compared to building an index.
> That way you wouldn't need to maintain the lists during the index build,
> except for adding each buffer to correct lists in getNodeBuffer().
>
> BTW, please use List for the linked lists. No need to re-implement the
> wheel.

Ok.

> 5) Unloading even last page of node buffer from main memory to the disk.
>> Imagine that that with levelstep = 1 each inner node has buffer. It
>> seems to me that keeping one page of each buffer in memory may be memory
>> consuming.
>>
>> Open items I see at this moment:
>> 1) I dislike my switching to buffering build method because it's based
>> on very unproven assumptions. And I didn't more reliable assumptions in
>> scientific papers while. I would like to replace it with something much
>> simplier. For example, switching to buffering build when regular build
>> actually starts to produce a lot of IO. For this approach implementation
>> I need to somehow detect actual IO (not just buffer read but miss of OS
>> cache).
>>
>
> Yeah, that's a surprisingly hard problem. I don't much like the method used
> in the patch either.

Is it possible to make buffering build a user defined option until we have a
better idea?

> 2) I'm worrying about possible size of nodeBuffersTab and path items. If
>> we imagine extremely large tree with levelstep = 1 size of this
>> datastructures will be singnificant. And it's hard to predict that size
>> without knowing of tree size.
>>
>
> I'm not very worried about that in practice. If you have a very large
> index, you presumably have a fair amount of memory too. Otherwise the
> machine is horrendously underpowered to build or do anything useful with the
> index anyway. Nevertheless it would nice to have some figures on that. If
> you have, say, an index of 1 TB in size, how much memory will building the
> index need?
>
I think with points it would be about 1 million of buffers and about 100-300
megabytes of RAM depending on space utilization. It may be ok, because 1 TB
is really huge index. But if maintenance_work_mem is low we can run out of
it. Though maintenance_work_mem is quite strange for system with 1 TB
indexes.

> Miscellaneous observations:
>
> * Please run pgindent over the code, there's a lot of spurious whitespace
> in the patch.
> * How about renaming GISTLoadedPartItem to something like
> GISTBulkInsertStack, to resemble the GISTInsertStack struct used in the
> normal insertion code. The "loaded part" nomenclature is obsolete, as the
> patch doesn't explicitly load parts of the tree into memory anymore. Think
> about the names of other structs, variables and functions too,
> GISTLoadedPartItem just caught my eye first but there's probably others that
> could have better names.
> * Any user-visible options need to be documented in the user manual.
> * And of course, make sure comments and the readme are up-to-date.
> * Compiler warning:
>
> reloptions.c:259: warning: initializer-string for array of chars is too
> long
> reloptions.c:259: warning: (near initialization for
> ‘stringRelOpts[0].default_val’**)
>
> I don't think there's a way to add an entry to stringRelOpts in a way that
> works correctly. That's a design flaw in the reloptions.c code that has
> never come up before, as there hasn't been any string-formatted relopts
> before (actually buffering option might be better served by an enum
> reloption too, if we had that). Please start a new thread on that on
> pgsql-hackers.

Ok.

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2011-08-08 10:27:33 Compiler warnings with stringRelOpts (was WIP: Fast GiST index build)
Previous Message Hannu Krosing 2011-08-08 10:07:07 Re: Transient plans versus the SPI API