Re: Very bad FTS performance with the Polish config

From: Sushant Sinha <sushant354(at)gmail(dot)com>
To: Wojciech Knapik <webmaster(at)wolniartysci(dot)pl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Very bad FTS performance with the Polish config
Date: 2009-11-19 04:29:12
Message-ID: 9fb559330911182029p67e5d282r1941d929ceb66246@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

ts_headline calls ts_lexize equivalent to break the text. Off course there
is algorithm to process the tokens and generate the headline. I would be
really surprised if the algorithm to generate the headline is somehow
dependent on language (as it only processes the tokens). So Oleg is right
when he says ts_lexize is something to be checked.

I will try to replicate what you are trying to do but in the meantime can
you run the same ts_headline under psql multiple times and paste the result.

-Sushant.

2009/11/19 Wojciech Knapik <webmaster(at)wolniartysci(dot)pl>

>
> Oleg Bartunov wrote:
>
> Yes, for 4-word texts the results are similar.
>>> Try that with a longer text and the difference becomes more and more
>>> significant. For the lorem ipsum text, 'polish' is about 4 times slower,
>>> than 'english'. For 5 repetitions of the text, it's 6 times, for 10
>>> repetitions - 7.5 times...
>>>
>>
>> Again, I see nothing unclear here, since dictionaries (as specified
>> in configuration) apply to ALL words in document. The more words in
>> document, the more overhead.
>>
>
> You're missing the point. I'm not surprised that the function takes more
> time for larger input texts - that's obvious. The thing is, the computation
> times rise more steeply when the Polish config is used. Steeply enough, that
> the difference between the Polish and English configs becomes enormous in
> practical cases.
>
> Now this may be expected behaviour, but since I don't know if it is, I
> posted to the mailing lists to find out. If you're saying this is ok and
> there's nothing to fix here, then there's nothing more to discuss and we may
> consider the thread closed.
> If not, ts_headline deserves a closer look.
>
> cheers,
> Wojciech Knapik
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Scott Bailey 2009-11-19 05:03:12 Re: xpath_table equivalent
Previous Message Andrew Gierth 2009-11-19 04:18:19 Re: Timezones (in 8.5?)