Re: Very bad FTS performance with the Polish config

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Wojciech Knapik <webmaster(at)wolniartysci(dot)pl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Very bad FTS performance with the Polish config
Date: 2009-11-18 09:11:19
Message-ID: Pine.LNX.4.64.0911181208290.6801@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Wojciech,

your polish_english, polish configurations uses ispell language and slow,
while english configuration doesn't contains ispell. So, what's your
complains ? Try add ispell dictionary to english configuration and see
timings.

Oleg

On Wed, 18 Nov 2009, Wojciech Knapik wrote:

>
> Hello
>
>
> This has been discussed in #postgresql and posted to -performance a
> couple days ago, but no solution has been found. The discussion can be
> found here:
> http://archives.postgresql.org/pgsql-performance/2009-11/msg00162.php
>
> I just finished implementing a "search engine" for my site and found
> ts_headline extremely slow when used with a Polish tsearch
> configuration, while fast with English. All of it boils down to a simple
> testcase, but first some background.
>
> I tested on 8.3.1 on G5/OSX 10.5.8 and Xeon/Gentoo AMD64-2008.0
> (2.6.21), then switched both installations to 8.3.8 (both packages
> compiled from source, but provided by the distro - port/emerge). The
> Polish dictionaries and config were created according to this article
> (it's in Polish, but the code is self-explanatory):
>
> http://www.depesz.com/index.php/2008/04/22/polish-tsearch-in-83-polski-tsearch-w-postgresie-83/
>
> Now for the testcase:
>
> text = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
> eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
> minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip
> ex ea commodo consequat. Duis aute irure dolor in reprehenderit in
> voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
> sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt
> mollit anim id est laborum.'
>
> # explain analyze select ts_headline('polish', text,
> plainto_tsquery('polish', 'foobar'));
> QUERY PLAN
> ------------------------------------------------------------------------------------
> Result (cost=0.00..0.01 rows=1 width=0) (actual time=6.407..6.470
> rows=1 loops=1)
> Total runtime: 6.524 ms
> (2 rows)
>
> # explain analyze select ts_headline('english', text,
> plainto_tsquery('english', 'foobar'));
> QUERY PLAN
> ------------------------------------------------------------------------------------
> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.861..0.895
> rows=1 loops=1)
> Total runtime: 0.935 ms
> (2 rows)
>
> # explain analyze select ts_headline('simple', text,
> plainto_tsquery('simple', 'foobar'));
> QUERY PLAN
> ------------------------------------------------------------------------------------
> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.627..0.660
> rows=1 loops=1)
> Total runtime: 0.697 ms
> (2 rows)
>
> #
>
> As you can see, the results differ by an order of magnitude between
> Polish and English. While in this simple testcase it's a non-issue, in
> the real world this translates into enormous overhead.
>
> One of the queries I ran testing my site's search function took
> 1870ms. When I took that query and changed all ts_headline(foo) calls to
> just foo, the time dropped below 100ms. That's the difference between
> something completely unacceptable and something quite useful.
>
> I can post various details about the hardware, software and specific
> queries, but the testcases speak for themselves. I'm sure you can easily
> reproduce my results.
>
> I'm putting my code into production tomorrow, since I can't wait
> anymore. Hints would be very much appreciated!
>
>
> cheers,
> Wojciech Knapik
>
> PS. This issue is not related to the loading time of dictionaries, or
> calls to ts_headline for results that won't be displayed. A few other
> details can be found here
> http://pastie.textmate.org/private/hqnqfnsfsknjyjlffzmog along with
> snippets of my conversations in #postgresql that lead to this testcase.
> Big thanks to RhodiumToad for helping me with fts for the last couple
> days ;]
>
>
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2009-11-18 09:11:59 Re: UTF8 with BOM support in psql
Previous Message Itagaki Takahiro 2009-11-18 08:46:02 Re: TRIGGER with WHEN clause