Re: FTS performance with the Polish config

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kenneth Marshall <ktm(at)rice(dot)edu>, Wojciech Knapik <webmaster(at)wolniartysci(dot)pl>, pgsql-performance(at)postgresql(dot)org
Subject: Re: FTS performance with the Polish config
Date: 2009-11-15 09:05:07
Message-ID: Pine.LNX.4.64.0911151201360.6801@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sun, 15 Nov 2009, Pavel Stehule wrote:

> 2009/11/15 Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>:
>> Yes, as stated original author use polish ispell dictionary.
>> Ispell dictionary is slow to load first time. In real life it should be no
>> problem.
>>
>
> it is a problem. People who needs fast access uses english without
> czech. It drop some features, but it is significaly faster.

just don't use ispell dictionary, czech snowball stemmer is as fast as
english.

Ispell dictionary (doesn't matter english, or other language) is slow for the
first load and then it caches, so there is no problem if use persistent
database connection, which is de facto standard for any serious projects.

>
> Pavel
>
>> Oleg
>> On Sat, 14 Nov 2009, Pavel Stehule wrote:
>>
>>> 2009/11/14 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>>>> Kenneth Marshall <ktm(at)rice(dot)edu> writes:
>>>>> On Sat, Nov 14, 2009 at 12:25:05PM +0100, Wojciech Knapik wrote:
>>>>>> I just finished implementing a "search engine" for my site and found
>>>>>> ts_headline extremely slow when used with a Polish tsearch
>>>>>> configuratio=
>>> n,
>>>>>> while fast with English.
>>>>
>>>>> The documentation for ts_headline() states:
>>>>> ts_headline uses the original document, not a tsvector summary, so it
>>>>> can be slow and should be used with care.
>>>>
>>>> That's true but the argument in the docs would apply just as well to
>>>> english or any other config. =C2=A0So while Wojciech would be well
>>>> advised
>>>> to try to avoid making a lot of calls to ts_headline, it's still curious
>>>> that it's so much slower in polish than english. =C2=A0Could we see a
>>>> self-contained test case?
>>>
>>> is it dictionary based or stem based?
>>>
>>> Dictionary based FTS is very slow (first load). Minimally czech FTS is
>>> slow.
>>>
>>> regards
>>> Pavel Stehule
>>>
>>>>
>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
>>>> =C2=
>>> =A0 =C2=A0regards, tom lane
>>>>
>>>> --
>>>> Sent via pgsql-performance mailing list
>>>> (pgsql-performance(at)postgresql(dot)org)
>>>> To make changes to your subscription:
>>>> http://www.postgresql.org/mailpref/pgsql-performance
>>>>
>>>
>>> --=20
>>> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance
>>>
>>
>>        Regards,
>>                Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Pavel Stehule 2009-11-15 09:15:05 Re: FTS performance with the Polish config
Previous Message Craig Ringer 2009-11-15 08:46:56 Re: SSD + RAID