Re: Very bad FTS performance with the Polish config

From: Wojciech Knapik <webmaster(at)wolniartysci(dot)pl>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Very bad FTS performance with the Polish config
Date: 2009-11-18 15:27:22
Message-ID: 4B04125A.50906@wolniartysci.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Oleg Bartunov wrote:

>>> your polish_english, polish configurations uses ispell language
>>> and slow, while english configuration doesn't contains ispell.
>>> So, what's your complains ? Try add ispell dictionary to english
>>> configuration and see timings.
>>
>> Oh, so this is not anomalous ? These are the expected speeds for an
>> ispell dictionary ? I didn't realize that. Sorry for the bother
>> then. It just seemed way too slow to be practical.
>
> You can see real timings using ts_lexize() function for different
> dictionaries (try several time to avoid cold-start problem) instead
> of ts_headline(), which involves other factors.
>
> On my test machine I see no real difference between very simple
> dictionary and french ispell, snowball dictionaries:

ts_lexize seems to be just as fast for simple, polish_ispell and
english_stem with the 'voila' argument.

polish_ispell is in fact *faster* for the lorem ipsum text repeated a
couple times (10 ?). Which suggests that the issue is with ts_headline
iteself.

> I see no big difference in ts_headline as well:
>
> dev-oleg=# select ts_headline('english','I can do voila',
> 'voila'::tsquery);
> ts_headline
> -----------------------
> I can do <b>voila</b>
> (1 row)
>
> Time: 0.265 ms

Yes, for 4-word texts the results are similar.
Try that with a longer text and the difference becomes more and more
significant. For the lorem ipsum text, 'polish' is about 4 times slower,
than 'english'. For 5 repetitions of the text, it's 6 times, for 10
repetitions - 7.5 times...

> This is 8.4.1 version of PostgreSQL.

An that was 8.3.8/OSX.

cheers,
Wojciech Knapik

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2009-11-18 15:37:29 Re: Python 3.1 support
Previous Message Tom Lane 2009-11-18 15:18:34 Re: UTF8 with BOM support in psql