Re: Full text search with ORDER BY performance issue

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Krade <krade(at)krade(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Full text search with ORDER BY performance issue
Date: 2009-07-29 14:02:11
Message-ID: 603c8f070907290702r2201e63k8c4c9ec7767dc42@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Jul 20, 2009 at 8:12 AM, Oleg Bartunov<oleg(at)sai(dot)msu(dot)su> wrote:
>> Here's a couple of queries:
>>
>> archive=> explain analyze select * from a where  comment_tsv @@
>> plainto_tsquery('love') order by timestamp desc limit 24 offset 0;
>>
>> QUERY PLAN
>> ----------
>> Limit  (cost=453248.73..453248.79 rows=24 width=281) (actual
>> time=188441.047..188441.148 rows=24 loops=1)
>>  ->  Sort  (cost=453248.73..453882.82 rows=253635 width=281) (actual
>> time=188441.043..188441.079 rows=24 loops=1)
>>        Sort Key: "timestamp"
>>        Sort Method:  top-N heapsort  Memory: 42kB
>>        ->  Bitmap Heap Scan on a  (cost=17782.16..446166.02 rows=253635
>> width=281) (actual time=2198.930..187948.050 rows=256378 loops=1)
>>              Recheck Cond: (comment_tsv @@ plainto_tsquery('love'::text))
>>              ->  Bitmap Index Scan on timestamp_comment_gin
>> (cost=0.00..17718.75 rows=253635 width=0) (actual time=2113.664..2113.664
>> rows=259828 loops=1)
>>                    Index Cond: (comment_tsv @@
>> plainto_tsquery('love'::text))
>> Total runtime: 188442.617 ms
>> (9 rows)
>>
>> archive=> explain analyze select * from a where  comment_tsv @@
>> plainto_tsquery('love') limit 24 offset 0;
>>
>> QUERY PLAN
>> ----------
>> Limit  (cost=0.00..66.34 rows=24 width=281) (actual time=14.632..53.647
>> rows=24 loops=1)
>>  ->  Seq Scan on a  (cost=0.00..701071.49 rows=253635 width=281) (actual
>> time=14.629..53.588 rows=24 loops=1)
>>        Filter: (comment_tsv @@ plainto_tsquery('love'::text))
>> Total runtime: 53.731 ms
>> (4 rows)
>>
>> First one runs painfully slow.
>
> Hmm, everything is already written in explain :) In the first query 253635
> rows should be readed from disk and sorted, while in the
> second query only 24 (random) rows readed from disk, so there is 4
> magnitudes
> difference and in the worst case you should expected time for the 1st query
> about 53*10^4 ms.

If love is an uncommon word, there's no help for queries of this type
being slow unless the GIN index can return the results in order. But
if love is a common word, then it would be faster to do an index scan
by timestamp on the baserel and then treat comment_tsv @@
plainto_tsquery('love') as a filter condition. Is this a selectivity
estimation bug?

...Robert

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message PFC 2009-07-29 14:18:38 Re: Full text search with ORDER BY performance issue
Previous Message Merlin Moncure 2009-07-29 13:39:17 Re: hyperthreaded cpu still an issue in 8.4?