Re: GiST index performance

From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Matthew Wakeling <matthew(at)flymine(dot)org>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject: Re: GiST index performance
Date: 2010-03-19 16:31:02
Message-ID: 4BA3A6C6.80406@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Yeb Havinga wrote:
> Yeb Havinga wrote:
>> Matthew Wakeling wrote:
>>>> Matthew Wakeling wrote:
>>>>> A second quite distinct issue is the general performance of GiST
>>>>> indexes
>>>>> which is also mentioned in the old thread linked from Open Items. For
>>>>> that, we have a test case at
>>>>> http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php
>>>>> for
>>>>> btree_gist indexes. I have a similar example with the bioseg GiST
>>>>> index. I
>>>>> have completely reimplemented the same algorithms in Java for
>>>>> algorithm
>>>>> investigation and instrumentation purposes, and it runs about a
>>>>> hundred
>>>>> times faster than in Postgres. I think this is a problem, and I'm
>>>>> willing
>>>>> to do some investigation to try and solve it.
>>> I have not made any progress on this issue. I think Oleg and Teodor
>>> would be better placed working it out. All I can say is that I
>>> implemented the exact same indexing algorithm in Java, and it
>>> performed 100 times faster than Postgres. Now, Postgres has to do a
>>> lot of additional work, like mapping the index onto disc, locking
>>> pages, and abstracting to plugin user functions, so I would expect
>>> some difference - I'm not sure 100 times is reasonable though. I
>>> tried to do some profiling, but couldn't see any one section of code
>>> that was taking too much time. Not sure what I can further do.
> Looked in the code a bit more - only the index nodes are compressed at
> index creation, the consistent functions does not compress queries, so
> not pallocs there. However when running Mathews example from
> http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php
> with the gist index, the coverage shows in gistget.c: 1000000 palloc0
> 's of gistsearchstack at line 152 and 2010982 palloc's also of the
> gistsearchstack on line 342. Two pfrees are also hit a lot: line 195:
> 1010926 of a stackentry and line 293: 200056 times. My $0.02 cents is
> that the pain is here. My knowledge of gistget or the other sources in
> access/gist is zero, but couldn't it be possible to determine the
> maximum needed size of the stack and then allocate it at once and use
> a pop/push kind off api?
Waisted some time today on a ghost chase... I though that removing the
millions of pallocs would help, so I wrote an alternative of the
gistsearchstack-stack to find out that it was not the index scanning
itself that caused milltions of pallocs, but the scan being in the inner
loop that was called 1000000 times. The actual scanning time was not
changed significantly.
The actual scanning time in my vm is for btree (actual
time=0.006..0.008) and gist (actual time=0.071..0.074). An error in my
searchstack alternative caused pages to be scanned twice, returing twice
the amount of rows (6 instead of 3 each time). This resulted in a
likewise increase of ms (actual time=0.075..0.150). Somewhere I hit
something that causes ~= 0.070 ms twice. For a single index scan,
0.070ms startup time for gist vs 0.006 for btree doesn't seem like a big
problem, but yeah when calling it a million times...

regards,
Yeb Havinga

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Justin Graf 2010-03-19 17:02:05 Re: too complex query plan for not exists query and multicolumn indexes
Previous Message Kevin Grittner 2010-03-19 13:58:56 Re: too complex query plan for not exists query and multicolumn indexes