Re: GiST index performance

From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Matthew Wakeling <matthew(at)flymine(dot)org>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject: Re: GiST index performance
Date: 2010-03-19 20:20:49
Message-ID: 4BA3DCA1.2010808@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Yeb Havinga wrote:
>>>>> Matthew Wakeling wrote:
>>>>>> A second quite distinct issue is the general performance of GiST
>>>>>> indexes
>>>>>> which is also mentioned in the old thread linked from Open Items.
>>>>>> For
>>>>>> that, we have a test case at
>>>>>> http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php
>>>>>> for
>>>>>> btree_gist indexes. I have a similar example with the bioseg GiST
>>>>>> index. I
>>>>>> have completely reimplemented the same algorithms in Java for
>>>>>> algorithm
>>>>>> investigation and instrumentation purposes, and it runs about a
>>>>>> hundred
>>>>>> times faster than in Postgres. I think this is a problem, and I'm
>>>>>> willing
>>>>>> to do some investigation to try and solve it.
More gist performance..

Some background: I've been involved in developing several datatypes that
make use of gist indexes (we just published a paper on it, see
http://arxiv.org/abs/1003.3370), that's the main reason I'm very much
interested in possible improvements in gist index speed. One of the
datatypes was 'very badly' indexable: non leaf pages were getting very
general keys, so in the end we could see from the scanning times
compared to sequential scans that the whole index was being scanned. One
thing I remember was the idea that somehow it would be nice if the dept
of the gist tree could be fiddled with: in that case keys of non leaf
nodes would be more specific again. In the original Guttman R-tree paper
there was mention of a parameter that determined the size of entries in
nodes and thereby indirectly the depth of the tree. I missed that in the
PostgreSQL gist api.

One thing Gist scanning does very often is key comparisons. So another
approach is to try to limit those and again this might be possible by
increasing the depth / decrease number of entries per page. I just did a
test where in gistfitpage the gistpagesize was divided by 5 and
something similar in gistnospace.

Scantime before adjustment: about 70 seconds.
After adjustment: 45 seconds.

With gist_stat from the gevel package confirmed that the depth was now 4
(original 3). Drawback: bigger index size because pages are not filled
completely anymore.

The explain shows (actual time=0.030..0.032) for the inner loop times,
which is less then half of the original scan time.

Since the gistpagesize is derived from the database blocksize, it might
be wise to set the blocksize low for this case, I'm going to play with
this a bit more.

regards,
Yeb Havinga

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Yeb Havinga 2010-03-19 20:49:30 Re: GiST index performance
Previous Message Tom Lane 2010-03-19 19:49:22 Re: PG using index+filter instead only use index