Re: [PERFORM] Slow TSearch2 performance for table with 1 million documents.

From: Benjamin Arai <me(at)benjaminarai(dot)com>
To: Benjamin Arai <benjamin(at)araisoft(dot)com>
Cc: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL <pgsql-general(at)postgresql(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Slow TSearch2 performance for table with 1 million documents.
Date: 2007-10-11 14:52:29
Message-ID: 3E8290A0-05DA-4F0F-A783-859124278D27@benjaminarai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

It appears that the ORDER BY rank operation is the slowing factor.
If I remove it then the query is pretty fast. Is there another way
to perform ORDER BY such that it does not do a sort?

Benjamin

On Oct 5, 2007, at 3:57 PM, Benjamin Arai wrote:

>
> On Oct 5, 2007, at 8:32 AM, Oleg Bartunov wrote:
>
>> On Fri, 5 Oct 2007, Tom Lane wrote:
>>
>>> Benjamin Arai <benjamin(at)araisoft(dot)com> writes:
>>>> # explain analyze select * FROM fulltext_article, to_tsquery
>>>> ('simple','dog') AS q WHERE idxfti @@ q ORDER BY rank(idxfti,
>>>> q) DESC;
>>>
>>>> QUERY PLAN
>>>> -------------------------------------------------------------------
>>>> -----
>>>> -------------------------------------------------------------------
>>>> -----
>>>> ------------
>>>> Sort (cost=6576.74..6579.07 rows=933 width=774) (actual
>>>> time=12969.237..12970.490 rows=5119 loops=1)
>>>> Sort Key: rank(fulltext_article.idxfti, q.q)
>>>> -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774)
>>>> (actual time=209.513..12955.498 rows=5119 loops=1)
>>>> -> Function Scan on q (cost=0.00..0.01 rows=1 width=32)
>>>> (actual time=0.005..0.006 rows=1 loops=1)
>>>> -> Bitmap Heap Scan on fulltext_article
>>>> (cost=3069.79..6516.70 rows=933 width=742) (actual
>>>> time=209.322..234.390 rows=5119 loops=1)
>>>> Recheck Cond: (fulltext_article.idxfti @@ q.q)
>>>> -> Bitmap Index Scan on
>>>> fulltext_article_idxfti_idx
>>>> (cost=0.00..3069.56 rows=933 width=0) (actual time=208.373..208.373
>>>> rows=5119 loops=1)
>>>> Index Cond: (fulltext_article.idxfti @@ q.q)
>>>> Total runtime: 12973.035 ms
>>>> (9 rows)
>>>
>>> The time seems all spent at the join step, which is odd because it
>>> really hasn't got much to do. AFAICS all it has to do is compute
>>> the
>>> rank() values that the sort step will use. Is it possible that
>>> rank() is really slow?
>>
>> can you try rank_cd() instead ?
>>
> Using Rank:
>
> -# ('simple','dog') AS q WHERE idxfti @@ q ORDER BY rank(idxfti,
> q) DESC;
>
> QUERY PLAN
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------
> Sort (cost=6576.74..6579.07 rows=933 width=774) (actual
> time=98083.081..98084.351 rows=5119 loops=1)
> Sort Key: rank(fulltext_article.idxfti, q.q)
> -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774)
> (actual time=479.122..98067.594 rows=5119 loops=1)
> -> Function Scan on q (cost=0.00..0.01 rows=1 width=32)
> (actual time=0.003..0.004 rows=1 loops=1)
> -> Bitmap Heap Scan on fulltext_article
> (cost=3069.79..6516.70 rows=933 width=742) (actual
> time=341.739..37112.110 rows=5119 loops=1)
> Recheck Cond: (fulltext_article.idxfti @@ q.q)
> -> Bitmap Index Scan on
> fulltext_article_idxfti_idx (cost=0.00..3069.56 rows=933 width=0)
> (actual time=321.443..321.443 rows=5119 loops=1)
> Index Cond: (fulltext_article.idxfti @@ q.q)
> Total runtime: 98087.575 ms
> (9 rows)
>
> Using Rank_cd:
>
> # explain analyze select * FROM fulltext_article, to_tsquery
> ('simple','cat') AS q WHERE idxfti @@ q ORDER BY rank_cd(idxfti,
> q) DESC;
>
> QUERY PLAN
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> -----------------
> Sort (cost=6576.74..6579.07 rows=933 width=774) (actual
> time=199316.648..199324.631 rows=26054 loops=1)
> Sort Key: rank_cd(fulltext_article.idxfti, q.q)
> -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774)
> (actual time=871.428..199244.330 rows=26054 loops=1)
> -> Function Scan on q (cost=0.00..0.01 rows=1 width=32)
> (actual time=0.006..0.007 rows=1 loops=1)
> -> Bitmap Heap Scan on fulltext_article
> (cost=3069.79..6516.70 rows=933 width=742) (actual
> time=850.674..50146.477 rows=26054 loops=1)
> Recheck Cond: (fulltext_article.idxfti @@ q.q)
> -> Bitmap Index Scan on
> fulltext_article_idxfti_idx (cost=0.00..3069.56 rows=933 width=0)
> (actual time=838.120..838.120 rows=26054 loops=1)
> Index Cond: (fulltext_article.idxfti @@ q.q)
> Total runtime: 199338.297 ms
> (9 rows)
>
>>
>>>
>>> regards, tom lane
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 5: don't forget to increase your free space map settings
>>>
>>
>> Regards,
>> Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet
>> (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Nis Jørgensen 2007-10-11 14:52:52 Re: Calculation of per Capita on-the-fly - problems with SQL syntax
Previous Message Tom Lane 2007-10-11 14:44:17 Re: XMIN semantic at peril ?

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2007-10-11 15:53:05 Re: [PERFORM] Slow TSearch2 performance for table with 1 million documents.
Previous Message Tom Lane 2007-10-11 14:51:35 Re: Huge amount of memory consumed during transaction