Re: [GENERAL] Creation of tsearch2 index is very slow

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephan Vollmer <svollmer(at)gmx(dot)de>, pgsql-performance(at)postgreSQL(dot)org
Subject: Re: [GENERAL] Creation of tsearch2 index is very slow
Date: 2006-01-20 22:33:27
Message-ID: 20060120223327.GH31908@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

On Fri, Jan 20, 2006 at 04:50:17PM -0500, Tom Lane wrote:
> (hemdistcache calls hemdistsign --- I think gprof is doing something
> funny with tail-calls here, and showing hemdistsign as directly called
> from gtsvector_picksplit when control really arrives through hemdistcache.)

It may be the compiler. All these functions are declared static, which
gives the compiler quite a bit of leeway to rearrange code.

> The bulk of the problem is clearly in this loop, which performs O(N^2)
> comparisons to find the two entries that are furthest apart in hemdist
> terms:

Ah. A while ago someone came onto the list asking about bit strings
indexing[1]. If I'd known tsearch worked like this I would have pointed
him to it. Anyway, before he went off to implement it he mentioned
"Jarvis-Patrick clustering", whatever that means.

Probably more relevent was this thread[2] on -hackers a while back with
pseudo-code[3]. How well it works, I don't know, it worked for him
evidently, he went away happy...

[1] http://archives.postgresql.org/pgsql-general/2005-11/msg00473.php
[2] http://archives.postgresql.org/pgsql-hackers/2005-11/msg01067.php
[3] http://archives.postgresql.org/pgsql-hackers/2005-11/msg01069.php

Hope this helps,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Marko Kreen 2006-01-20 22:41:40 Re: Page-Level Encryption
Previous Message Ron 2006-01-20 22:29:46 Re: [GENERAL] Creation of tsearch2 index is very slow

Browse pgsql-performance by date

  From Date Subject
Next Message Ron 2006-01-20 22:46:34 Re: [GENERAL] Creation of tsearch2 index is very slow
Previous Message Ron 2006-01-20 22:29:46 Re: [GENERAL] Creation of tsearch2 index is very slow