Skip site navigation (1) Skip section navigation (2)

Re: pg_trgm performance

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: pg_trgm performance
Date: 2007-02-26 12:53:16
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
On Mon, 26 Feb 2007, Guillaume Smet wrote:

> On 2/24/07, Steinar H. Gunderson <sgunderson(at)bigfoot(dot)com> wrote:
> Thanks for your time.
>> GiN version, short:
>>    ->  Bitmap Heap Scan on tags  (cost=8.64..151.79 rows=41 width=0) 
>> (actual time=5.555..30.157 rows=7 loops=1)
>>          Filter: (title % 'foo'::text)
>>          ->  Bitmap Index Scan on trgm_idx  (cost=0.00..8.63 rows=41 
>> width=0) (actual time=2.857..2.857 rows=5555 loops=1)
>>                Index Cond: (title % 'foo'::text)
> This is currently the worst case in the gist - gin comparison because
> in the index scan, gin version doesn't have the length of the indexed
> string. So it returns a lot of rows which have every trigram of your
> search string but has in fact a low similarity due to the length of
> the indexed string (5555 rows -> 7 rows).
> It cannot be fixed at the moment due to the way GIN indexes work.
>> So, the GiN version seems to be a bit faster for long queries, but it's 
>> still
>> too slow -- in fact, _unindexed_ versions give 141ms, 342ms, 725ms for 
>> these
>> three queries, so for the longer queries, the gain is only about a factor
>> two. (By the way, I would like to stress that this is not my personal music
>> collection! :-P)
> The fact is that pg_trgm is designed to index words and not to index
> long sentences. I'm not that surprised it's slow in your case.
> It's also my case but following the instructions in README.pg_trgm I
> created a dictionary of words using tsearch2 (stat function) and I use
> pg_trgm on this dictionary to find similar words in my dictionary.
> For example, I rewrite the search:
> auberge cevenes
> as:
> (auberge | auberges | aubberge | auberg) & (ceven | cene | cevenol | 
> cevennes)
> using pg_trgm and my query can find Auberge des C?vennes (currently
> it's limited to the 4th most similar words but I can change it
> easily).

Did you rewrite query manually or use rewrite feature of tsearch2 ?

> --
> Guillaume
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

Oleg Bartunov, Research Scientist, Head of AstroNet (,
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su,
phone: +007(495)939-16-83, +007(495)939-23-83

In response to


pgsql-performance by date

Next:From: Guillaume SmetDate: 2007-02-26 12:58:20
Subject: Re: pg_trgm performance
Previous:From: Guillaume SmetDate: 2007-02-26 12:42:40
Subject: Re: pg_trgm performance

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group