Skip site navigation (1) Skip section navigation (2)

Re: pg_trgm performance

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: pg_trgm performance
Date: 2007-02-26 12:53:16
Message-ID: Pine.LNX.4.64.0702261552120.400@sn.sai.msu.ru (view raw or flat)
Thread:
Lists: pgsql-performance
On Mon, 26 Feb 2007, Guillaume Smet wrote:

> On 2/24/07, Steinar H. Gunderson <sgunderson(at)bigfoot(dot)com> wrote:
>
> Thanks for your time.
>
>> GiN version, short:
>>    ->  Bitmap Heap Scan on tags  (cost=8.64..151.79 rows=41 width=0) 
>> (actual time=5.555..30.157 rows=7 loops=1)
>>          Filter: (title % 'foo'::text)
>>          ->  Bitmap Index Scan on trgm_idx  (cost=0.00..8.63 rows=41 
>> width=0) (actual time=2.857..2.857 rows=5555 loops=1)
>>                Index Cond: (title % 'foo'::text)
>
> This is currently the worst case in the gist - gin comparison because
> in the index scan, gin version doesn't have the length of the indexed
> string. So it returns a lot of rows which have every trigram of your
> search string but has in fact a low similarity due to the length of
> the indexed string (5555 rows -> 7 rows).
> It cannot be fixed at the moment due to the way GIN indexes work.
>
>> So, the GiN version seems to be a bit faster for long queries, but it's 
>> still
>> too slow -- in fact, _unindexed_ versions give 141ms, 342ms, 725ms for 
>> these
>> three queries, so for the longer queries, the gain is only about a factor
>> two. (By the way, I would like to stress that this is not my personal music
>> collection! :-P)
>
> The fact is that pg_trgm is designed to index words and not to index
> long sentences. I'm not that surprised it's slow in your case.
>
> It's also my case but following the instructions in README.pg_trgm I
> created a dictionary of words using tsearch2 (stat function) and I use
> pg_trgm on this dictionary to find similar words in my dictionary.
>
> For example, I rewrite the search:
> auberge cevenes
> as:
> (auberge | auberges | aubberge | auberg) & (ceven | cene | cevenol | 
> cevennes)
> using pg_trgm and my query can find Auberge des C?vennes (currently
> it's limited to the 4th most similar words but I can change it
> easily).

Did you rewrite query manually or use rewrite feature of tsearch2 ?

>
> --
> Guillaume
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

 	Regards,
 		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

pgsql-performance by date

Next:From: Guillaume SmetDate: 2007-02-26 12:58:20
Subject: Re: pg_trgm performance
Previous:From: Guillaume SmetDate: 2007-02-26 12:42:40
Subject: Re: pg_trgm performance

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group