Re: tsearch2 poor performance

From: Kris Kiger <kris(at)musicrebellion(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: tsearch2 poor performance
Date: 2004-10-01 22:01:48
Message-ID: 415DD3CC.3030208@musicrebellion.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

Hey all, its me again. If I do not do a count(product_id) on my
tsearch2 queries, its actually really fast, for example;

explain analyze SELECT product_id FROM product, to_tsquery('bear') AS q
WHERE vector @@ q LIMIT 1000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..2081.60 rows=1000 width=4) (actual
time=2.308..51.522 rows=1000 loops=1)
-> Nested Loop (cost=0.00..6244798.55 rows=3000001 width=4) (actual
time=2.299..45.637 rows=1000 loops=1)
-> Function Scan on q (cost=0.00..12.50 rows=1000 width=32)
(actual time=0.023..0.023 rows=1 loops=1)
-> Index Scan using vector_idx on product (cost=0.00..6207.29
rows=3000 width=36) (actual time=2.253..37.946 rows=1000 loops=1)
Index Cond: (product.vector @@ "outer".q)
Filter: (product.vector @@ "outer".q)
Total runtime: 122.487 ms

explain analyze SELECT product_id FROM product, to_tsquery('complex') AS
q WHERE vector @@ q LIMIT 1000;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..2081.60 rows=1000 width=4) (actual
time=4.943..2325.949 rows=1000 loops=1)
-> Nested Loop (cost=0.00..6244798.55 rows=3000001 width=4) (actual
time=4.933..2319.885 rows=1000 loops=1)
-> Function Scan on q (cost=0.00..12.50 rows=1000 width=32)
(actual time=0.040..0.040 rows=1 loops=1)
-> Index Scan using vector_idx on product (cost=0.00..6207.29
rows=3000 width=36) (actual time=4.868..2311.087 rows=1000 loops=1)
Index Cond: (product.vector @@ "outer".q)
Filter: (product.vector @@ "outer".q)
Total runtime: 2329.389 ms

From what I know, there is only one reason I can offer why a count
takes approximately 30~40 seconds longer on these same queries... that
is that count has to evaluate whether a value is null or not. There
probably is a better reason, if anyone has any ideas, I would much
appreciate you sharing! Also, why the big difference in query times in
the above?

bear appears 780963 times in 696668 documents
complex appears 468669 times in 440339 documents.

Again, thanks all!

Kris

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Christopher Browne 2004-10-01 22:10:12 Re: Does PostgreSQL Stores its database in multiple disks?
Previous Message Steve Crawford 2004-10-01 21:47:01 Re: PLEASE GOD HELP US!

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris Browne 2004-10-01 22:02:35 Re: AIX and V8 beta 3
Previous Message Tom Lane 2004-10-01 21:17:28 Re: date_trunc'd timestamp index possible?