combined indexes with Gist - planner issues?

From: Hans-Juergen Schoenig -- PostgreSQL <postgres(at)cybertec(dot)at>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Zoltan Boszormenyi <zb(at)cybertec(dot)at>
Subject: combined indexes with Gist - planner issues?
Date: 2009-08-31 12:08:25
Message-ID: 4A9BBD39.2010500@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

hello everybody,

we are seriously fighting with some planner issue which seems to be
slightly obscure to us.
we have a table which is nicely indexed (several GB in size).
i am using btree_gist operator classes to use a combined index including
an FTI expression along with a number:

db=# \d product.t_product
Table "product.t_product"
Column | Type |
Modifiers
-----------------------+---------------+----------------------------------------------------------------
id | bigint | not null default
nextval('product.t_product_id_seq'::regclass)
shop_id | integer |
art_number | text |
title | text |
description | text |
display_price | numeric(10,4) |

Indexes:
"t_product_pkey" PRIMARY KEY, btree (id)
"idx_test" gist (display_price, to_tsvector('german'::regconfig,
(title || ' '::text) || description))
* "idx_test2" gist (to_tsvector('german'::regconfig, (title || '
'::text) || description), display_price)*

what we basically expected here is that Postgres will scan the table
using the index to give us the cheapest products containing the words we
are looking for.
i am totally surprised to see that we have to fetch all products given
the words, sort and then do the limit.
this totally kills performance because some words simply show up
millions of times. this totally kills everything.

the plans look like this:

db=# explain analyze SELECT art_number, title
FROM product.t_product
WHERE to_tsvector('german'::regconfig, (title || ' '::text) ||
description) @@ plainto_tsquery('harddisk')
ORDER BY display_price
LIMIT 10;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=108340.08..108340.10 rows=10 width=54) (actual
time=1328.900..1328.909 rows=10 loops=1)
-> Sort (cost=108340.08..108422.48 rows=32961 width=54) (actual
time=1328.899..1328.905 rows=10 loops=1)
Sort Key: display_price
Sort Method: top-N heapsort Memory: 18kB
-> Bitmap Heap Scan on t_product (cost=2716.62..107627.80
rows=32961 width=54) (actual time=1052.706..1328.772 rows=55 loops=1)
Recheck Cond: (to_tsvector('german'::regconfig, ((title
|| ' '::text) || description)) @@ plainto_tsquery('harddisk'::text))
-> Bitmap Index Scan on idx_test2 (cost=0.00..2708.38
rows=32961 width=0) (actual time=1052.576..1052.576 rows=55 loops=1)
Index Cond: (to_tsvector('german'::regconfig,
((title || ' '::text) || description)) @@ plainto_tsquery('harddisk'::text))
Total runtime: 1328.942 ms
(9 rows)

runtime increases badly if words start to be more likely ...

db=# explain analyze SELECT art_number, title
FROM product.t_product
WHERE to_tsvector('german'::regconfig, (title || ' '::text) ||
description) @@ plainto_tsquery('spiel')
ORDER BY display_price
LIMIT 10;
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=108340.08..108340.10 rows=10 width=54) (actual
time=33489.675..33489.682 rows=10 loops=1)
-> Sort (cost=108340.08..108422.48 rows=32961 width=54) (actual
time=33489.675..33489.675 rows=10 loops=1)
Sort Key: display_price
Sort Method: top-N heapsort Memory: 18kB
-> Bitmap Heap Scan on t_product (cost=2716.62..107627.80
rows=32961 width=54) (actual time=774.923..33408.522 rows=56047 loops=1)
Recheck Cond: (to_tsvector('german'::regconfig, ((title
|| ' '::text) || description)) @@ plainto_tsquery('spiel'::text))
-> Bitmap Index Scan on idx_test2 (cost=0.00..2708.38
rows=32961 width=0) (actual time=759.078..759.078 rows=56047 loops=1)
Index Cond: (to_tsvector('german'::regconfig,
((title || ' '::text) || description)) @@ plainto_tsquery('spiel'::text))
Total runtime: 33489.906 ms
(9 rows)

i am wondering why postgres is not able to use a combined index here?
is this some obscure thing related to gist, a logical problem or a
planner deficiency?

ideas are welcome.

many thanks,

hans

--
Cybertec Schoenig & Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2009-08-31 12:50:28 Hot Standby, conflict cache
Previous Message Greg Stark 2009-08-31 11:27:46 Re: Tightening binary receive functions