Re: pg_trgm performance

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: pg_trgm performance
Date: 2007-02-26 12:53:16
Message-ID: Pine.LNX.4.64.0702261552120.400@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, 26 Feb 2007, Guillaume Smet wrote:

> On 2/24/07, Steinar H. Gunderson <sgunderson(at)bigfoot(dot)com> wrote:
>
> Thanks for your time.
>
>> GiN version, short:
>> -> Bitmap Heap Scan on tags (cost=8.64..151.79 rows=41 width=0)
>> (actual time=5.555..30.157 rows=7 loops=1)
>> Filter: (title % 'foo'::text)
>> -> Bitmap Index Scan on trgm_idx (cost=0.00..8.63 rows=41
>> width=0) (actual time=2.857..2.857 rows=5555 loops=1)
>> Index Cond: (title % 'foo'::text)
>
> This is currently the worst case in the gist - gin comparison because
> in the index scan, gin version doesn't have the length of the indexed
> string. So it returns a lot of rows which have every trigram of your
> search string but has in fact a low similarity due to the length of
> the indexed string (5555 rows -> 7 rows).
> It cannot be fixed at the moment due to the way GIN indexes work.
>
>> So, the GiN version seems to be a bit faster for long queries, but it's
>> still
>> too slow -- in fact, _unindexed_ versions give 141ms, 342ms, 725ms for
>> these
>> three queries, so for the longer queries, the gain is only about a factor
>> two. (By the way, I would like to stress that this is not my personal music
>> collection! :-P)
>
> The fact is that pg_trgm is designed to index words and not to index
> long sentences. I'm not that surprised it's slow in your case.
>
> It's also my case but following the instructions in README.pg_trgm I
> created a dictionary of words using tsearch2 (stat function) and I use
> pg_trgm on this dictionary to find similar words in my dictionary.
>
> For example, I rewrite the search:
> auberge cevenes
> as:
> (auberge | auberges | aubberge | auberg) & (ceven | cene | cevenol |
> cevennes)
> using pg_trgm and my query can find Auberge des C?vennes (currently
> it's limited to the 4th most similar words but I can change it
> easily).

Did you rewrite query manually or use rewrite feature of tsearch2 ?

>
> --
> Guillaume
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Guillaume Smet 2007-02-26 12:58:20 Re: pg_trgm performance
Previous Message Guillaume Smet 2007-02-26 12:42:40 Re: pg_trgm performance