From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
Cc: | Wojciech Knapik <webmaster(at)wolniartysci(dot)pl>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Very bad FTS performance with the Polish config |
Date: | 2009-11-18 15:38:20 |
Message-ID: | 162867790911180738u17d1b6e1o3cf43062882b5e20@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2009/11/18 Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>:
> On Wed, 18 Nov 2009, Wojciech Knapik wrote:
>
>>
>>> your polish_english, polish configurations uses ispell language and slow,
>>> while english configuration doesn't contains ispell. So, what's your
>>> complains ? Try add ispell dictionary to english configuration and see
>>> timings.
>>
>> Oh, so this is not anomalous ? These are the expected speeds for an ispell
>> dictionary ? I didn't realize that. Sorry for the bother then. It just
>> seemed way too slow to be practical.
>
> You can see real timings using ts_lexize() function for different
> dictionaries
> (try several time to avoid cold-start problem) instead of ts_headline(),
> which involves other factors.
>
> On my test machine I see no real difference between very simple dictionary
> and french ispell, snowball dictionaries:
>
It's depend on language (and dictionary sizes).
for czech:
postgres=# select ts_lexize('simple','vody');
ts_lexize
-----------
{vody}
(1 row)
Time: 0.785 ms
postgres=# select ts_lexize('cspell','vody');
ts_lexize
-----------
{voda}
(1 row)
Time: 1.041 ms
I afraid so czech and polland language is very hard (with long affix file).
Regards
Pavel
> dev-oleg=# select ts_lexize('simple','voila');
> ts_lexize
> -----------
> {voila}
> (1 row)
>
> Time: 0.282 ms
> dev-oleg=# select ts_lexize('simple','voila');
> ts_lexize
> -----------
> {voila}
> (1 row)
>
> Time: 0.269 ms
>
> dev-oleg=# select ts_lexize('french_stem','voila');
> ts_lexize
> -----------
> {voil}
> (1 row)
>
> Time: 0.187 ms
>
> I see no big difference in ts_headline as well:
>
> dev-oleg=# select ts_headline('english','I can do voila', 'voila'::tsquery);
> ts_headline
> -----------------------
> I can do <b>voila</b>
> (1 row)
>
> Time: 0.265 ms
> dev-oleg=# select ts_headline('nomaofr','I can do voila', 'voila'::tsquery);
> ts_headline
> -----------------------
> I can do <b>voila</b>
> (1 row)
>
> Time: 0.299 ms
>
> This is 8.4.1 version of PostgreSQL.
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
From | Date | Subject | |
---|---|---|---|
Next Message | Oleg Bartunov | 2009-11-18 15:52:17 | Re: Very bad FTS performance with the Polish config |
Previous Message | Peter Eisentraut | 2009-11-18 15:37:29 | Re: Python 3.1 support |