Skip site navigation (1) Skip section navigation (2)

Re: Full text search with ORDER BY performance issue

From: Krade <krade(at)krade(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Full text search with ORDER BY performance issue
Date: 2009-07-20 20:48:54
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
Hello, thanks for your replies.

On 7/20/2009 13:12, Oleg Bartunov wrote:

> Hmm, everything is already written in explain :) In the first query 
> 253635 rows should be readed from disk and sorted, while in the
> second query only 24 (random) rows readed from disk, so there is 4 
> magnitudes
> difference and in the worst case you should expected time for the 1st 
> query
> about 53*10^4 ms.
Yes, I do realize the first query is retrieving all the rows that match 
the full text search and sorting them, that's what I wanted to avoid. :) 
Since I only want 24 results at a time, I wanted to avoid having to get 
all the rows and sort them. I was wondering if there was any way to use, 
say, some index combination I'm not aware of, cluster the table 
according to an index or using a different query to get the same results.

Well, to take advantage of the gin index on (timestamp, comment_tsv), I 
suppose could do something like this:
archive=> explain analyze select * from a where comment_tsv @@ 
plainto_tsquery('love') and timestamp > cast(floor(extract(epoch from 
CURRENT_TIMESTAMP) - 864000) as integer) order by timestamp limit 24 
offset 0;

Limit (cost=17326.69..17326.75 rows=24 width=281) (actual 
time=3249.192..3249.287 rows=24 loops=1)
-> Sort (cost=17326.69..17337.93 rows=4499 width=281) (actual 
time=3249.188..3249.221 rows=24 loops=1)
Sort Key: "timestamp"
Sort Method: top-N heapsort Memory: 39kB
-> Bitmap Heap Scan on a (cost=408.80..17201.05 rows=4499 width=281) 
(actual time=3223.890..3240.484 rows=5525 loops=1)
Recheck Cond: (("timestamp" > (floor((date_part('epoch'::text, now()) - 
864000::double precision)))::integer) AND (comment_tsv @@ 
-> Bitmap Index Scan on timestamp_comment_gin (cost=0.00..407.67 
rows=4499 width=0) (actual time=3222.769..3222.769 rows=11242 loops=1)
Index Cond: (("timestamp" > (floor((date_part('epoch'::text, now()) - 
864000::double precision)))::integer) AND (comment_tsv @@ 
Total runtime: 3249.957 ms
(9 rows)

Which only looks at the last 10 days and is considerably faster. Not 
perfect, but a lot better. But this requires a lot of application logic, 
for example, if I didn't get 24 results in the first query, I'd have to 
reissue the query with a larger time interval and it gets worse pretty 
fast. It strikes me as a really dumb thing to do.

I'm really hitting a brick wall here, I can't seem to be able to provide 
reasonably efficient full text search that is ordered by date rather 
than random results from the database.

On 7/20/2009 13:22, Marcin Stępnicki wrote:
> What happens if you make it:
> select * from (
> select * from a where comment_tsv @@plainto_tsquery('love')
> ) xx
> order by xx.timestamp desc
> limit 24 offset 0;
> ?
Same query plan, I'm afraid.

In response to

pgsql-performance by date

Next:From: Kevin GrittnerDate: 2009-07-20 21:03:44
Subject: Re: Calling conventions
Previous:From: Tom LaneDate: 2009-07-20 19:54:03
Subject: Re: XMLPARSE() evaluated multiple times?

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group