Index not used on select (Is this more OR + LIKE?)

From: Mike Mascari <mascarim(at)yahoo(dot)com>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Index not used on select (Is this more OR + LIKE?)
Date: 1999-07-23 16:19:41
Message-ID: 19990723161942.21461.rocketmail@send205.yahoomail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've been reading Ole's posts with great interest
as we've have just experienced similar problems
ourselves here. I opened up a Slip ;-) #5 but
I've narrowed down the difference between two
queries which illustrate the problem:

The first query:

SELECT DISTINCT
supplies.supply,supplies.supplyunit,
supplies.purchaseunit,supplies.vendor,
supplies.vendorgroup,supplies.vendoritem,
supplies.vendorname,supplies.description,
supplies.conversion,supplies.price,
supplies.inventory,supplies.commodity,
supplies.adddate
FROM
supplies,permitbuy,locations,supplychains,reserves
WHERE
permitbuy.webuser = 'mascarj' AND
(locations.company,locations.costcntr) =
(permitbuy.company, permitbuy.costcntr) AND
supplychains.target = locations.target AND
reserves.target = supplychains.supplysource AND
supplies.supply = reserves.supply AND
supplies.inventory = '1' AND

((upper(supplies.supply) LIKE '%SEQ%') OR
(upper(supplies.vendoritem) LIKE '%SEQ%') OR
(upper(supplies.vendorname) LIKE '%SEQ%') OR
(upper(supplies.description) LIKE '%SEQ%'))

ORDER BY
supplies.description;

The EXPLAIN shows its using indices as it should:
NOTICE: QUERY PLAN:

Unique (cost=24076.77 rows=8260854 width=220)
-> Sort (cost=24076.77 rows=8260854 width=220)
-> Hash Join (cost=24076.77 rows=8260854
width=220)
-> Hash Join (cost=1756.00 rows=597537
width=76)
-> Seq Scan on reserves
(cost=938.44 rows=20468 width=16)
-> Hash (cost=121.44 rows=475
width=60)
-> Hash Join (cost=121.44
rows=475 width=60)
-> Seq Scan on
supplychains (cost=49.28 rows=1251 width=8)
-> Hash (cost=26.80
rows=93 width=52)
-> Hash Join
(cost=26.80 rows=93 width=52)
-> Seq
Scan on locations (cost=10.09 rows=245 width=28)
-> Hash
(cost=5.78 rows=56 width=24)
->
Index Scan using k_permitbuy1 on permitbuy (cost=5.78
rows=56 width=24)
-> Hash (cost=1675.03 rows=17637
width=144)
-> Seq Scan on supplies
(cost=1675.03 rows=17637 width=144)

EXPLAIN

This query works as expected and returns within
a reasonable amount of time. However, if an OR
clause is introduced as below:

SELECT DISTINCT
supplies.supply,supplies.supplyunit,
supplies.purchaseunit,supplies.vendor,
supplies.vendorgroup,supplies.vendoritem,
supplies.vendorname,supplies.description,
supplies.conversion,supplies.price,
supplies.inventory,supplies.commodity,
supplies.adddate
FROM
supplies,permitbuy,locations,supplychains,reserves
WHERE
permitbuy.webuser = 'mascarj' AND
(locations.company,locations.costcntr) =
(permitbuy.company, permitbuy.costcntr) AND
supplychains.target = locations.target AND
reserves.target = supplychains.supplysource AND
supplies.supply = reserves.supply AND
supplies.inventory = '1' AND

((upper(supplies.supply) LIKE '%SEQ%') OR
(upper(supplies.vendoritem) LIKE '%SEQ%') OR
(upper(supplies.vendorname) LIKE '%SEQ%') OR
(upper(supplies.description) LIKE '%SEQ%'))

OR <-- This is built by our search engine to allow
-- our users to enter: [SEQ or SCD]...

((upper(supplies.supply) LIKE '%SCD%') OR
(upper(supplies.vendoritem) LIKE '%SCD%') OR
(upper(supplies.vendorname) LIKE '%SCD%') OR
(upper(supplies.description) LIKE '%SCD%'))

ORDER BY
supplies.description;

The EXPLAIN shows that it doesn't bother to use
the indices for ANY of the joins:

NOTICE: QUERY PLAN:

Unique (cost=63290466304.00 rows=1073741850
width=232)
-> Sort (cost=63290466304.00 rows=1073741850
width=232)
-> Nested Loop (cost=63290466304.00
rows=1073741850 width=232)
-> Nested Loop (cost=52461780992.00
rows=1073741851 width=204)
-> Nested Loop
(cost=28277893120.00 rows=1073741851 width=168)
-> Nested Loop
(cost=28033934.00 rows=573217107 width=160)
-> Seq Scan on supplies
(cost=1675.03 rows=29871 width=144)
-> Seq Scan on
reserves (cost=938.44 rows=20468 width=16)
-> Seq Scan on supplychains
(cost=49.28 rows=1251 width=8) ->
Seq Scan on permitbuy (cost=22.52 rows=531 width=36)
-> Seq Scan on locations (cost=10.09
rows=245 width=28)

EXPLAIN

The plan shows that it will have to perform a
sequential scan on the supplies table, which I
obviously expected because of the use of LIKE, in
both plans. However, why is it, that, when an
OR clause which exclusively references the supplies
table is appended to the query, the planner/optimizer
(which already must perform a sequential scan on
supplies) now totally ignores all the indices
built on the other tables? The result is an
execution plan which consumes all RAM on the machine,
and, at 410M, I killed it, because it was about to
consume all swap space as well...

Any help would be greatly appreciated

Mike Mascari (mascarim(at)yahoo(dot)com)

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 1999-07-23 16:23:44 Re: SSL patch
Previous Message Tom Lane 1999-07-23 16:03:09 Re: [HACKERS] Index not used on simple select