tsearch2 question (was: Poor performance on seq scan)

From: Laszlo Nagy <gandalf(at)designaproduct(dot)biz>
To: pgsql-performance(at)postgresql(dot)org
Subject: tsearch2 question (was: Poor performance on seq scan)
Date: 2006-09-12 17:01:32
Message-ID: 4506E7EC.8080605@designaproduct.biz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane wrote:
> Only if the index is capable of disgorging the original value of the
> indexed column, a fact not in evidence in general (counterexample:
> polygons indexed by their bounding boxes in an r-tree). But yeah,
> it's interesting to think about applying filters at the index fetch
> step for index types that can hand back full values. This has been
> discussed before --- I think we had gotten as far as speculating about
> doing joins with just index values. See eg here:
> http://archives.postgresql.org/pgsql-hackers/2004-05/msg00944.php
> A lot of the low-level concerns have already been dealt with in order to
> support bitmap indexscans, but applying non-indexable conditions before
> fetching from the heap is still not done.
>
To overcome this problem, I created a smaller "shadow" table:

CREATE TABLE product_search
(
id int8 NOT NULL,
name_desc text,
CONSTRAINT pk_product_search PRIMARY KEY (id)
);

insert into product_search
select
id,
name || ' ' || coalesce(description,'')
from product;

Obviously, this is almost like an index, but I need to maintain it
manually. I'm able to search with

zeusd1=> explain analyze select id from product_search where name_desc
like '%Mug%';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Seq Scan on product_search (cost=0.00..54693.34 rows=36487 width=8)
(actual time=20.036..2541.971 rows=91399 loops=1)
Filter: (name_desc ~~ '%Mug%'::text)
Total runtime: 2581.272 ms
(3 rows)

The total runtime remains below 3 sec in all cases. Of course I still
need to join the main table to the result:

explain analyze select s.id,p.name from product_search s inner join
product p on (p.id = s.id) where s.name_desc like '%Tiffany%'

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..55042.84 rows=58 width=40) (actual
time=164.437..3982.610 rows=117 loops=1)
-> Seq Scan on product_search s (cost=0.00..54693.34 rows=58
width=8) (actual time=103.651..2717.914 rows=117 loops=1)
Filter: (name_desc ~~ '%Tiffany%'::text)
-> Index Scan using pk_product_id on product p (cost=0.00..6.01
rows=1 width=40) (actual time=10.793..10.796 rows=1 loops=117)
Index Cond: (p.id = "outer".id)
Total runtime: 4007.283 ms
(6 rows)

Took 4 seconds. Awesome! With the original table, it used to be one or
two minutes!

Now you can ask, why am I not using tsearch2 for this? Here is answer:

CREATE TABLE product_search
(
id int8 NOT NULL,
ts_name_desc tsvector,
CONSTRAINT pk_product_search PRIMARY KEY (id)
);

insert into product_search
select
id,
to_tsvector(name || ' ' coalesce(description,''))
from product;

CREATE INDEX idx_product_search_ts_name_desc ON product_search USING
gist (ts_name_desc);
VACUUM product_search;

zeusd1=> explain analyze select id from product_search where
ts_name_desc @@ to_tsquery('mug');
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on product_search (cost=25.19..3378.20 rows=912
width=8) (actual time=954.669..13112.009 rows=91434 loops=1)
Filter: (ts_name_desc @@ '''mug'''::tsquery)
-> Bitmap Index Scan on idx_product_search_ts_name_desc
(cost=0.00..25.19 rows=912 width=0) (actual time=932.455..932.455
rows=91436 loops=1)
Index Cond: (ts_name_desc @@ '''mug'''::tsquery)
Total runtime: 13155.724 ms
(5 rows)

zeusd1=> explain analyze select id from product_search where
ts_name_desc @@ to_tsquery('tiffany');

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on product_search (cost=25.19..3378.20 rows=912
width=8) (actual time=13151.725..13639.112 rows=76 loops=1)
Filter: (ts_name_desc @@ '''tiffani'''::tsquery)
-> Bitmap Index Scan on idx_product_search_ts_name_desc
(cost=0.00..25.19 rows=912 width=0) (actual time=13123.705..13123.705
rows=81 loops=1)
Index Cond: (ts_name_desc @@ '''tiffani'''::tsquery)
Total runtime: 13639.478 ms
(5 rows)

At least 13 seconds, and the main table is not joined yet. Can anybody
explain to me, why the seq scan is faster than the bitmap index? In the
last example there were only 81 rows returned, but it took more than 13
seconds. :( Even if the whole table can be cached into memory (which
isn't the case), the bitmap index should be much faster. Probably there
is a big problem with my schema but I cannot find it. What am I doing wrong?

Thanks,

Laszlo

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Laszlo Nagy 2006-09-12 17:12:36 Re: Poor performance on seq scan
Previous Message Tom Lane 2006-09-12 16:52:06 Re: Poor performance on seq scan