Skip site navigation (1) Skip section navigation (2)

Planner should use index on a LIKE 'foo%' query

From: Moritz Onken <onken(at)houseofdesign(dot)de>
To: pgsql-performance(at)postgresql(dot)org
Subject: Planner should use index on a LIKE 'foo%' query
Date: 2008-06-28 16:24:42
Message-ID: 287E4428-BECC-46D6-BC47-EA94EA0121F0@houseofdesign.de (view raw or flat)
Thread:
Lists: pgsql-performance
Hi,

I have a query

select count(*)
  from result
  where exists
    (select * from item where item.url LIKE result.url || '%' limit 1);

which basically returns the number of items which exist in table  
result and match a URL in table item by its prefix.
I read all about idexes (http://www.postgresql.org/docs/8.3/static/indexes-types.html 
) and especially this part:
"The optimizer can also use a B-tree index for queries involving the  
pattern matching operators LIKE and ~ if the pattern is a constant and  
is anchored to the beginning of the string — for example, col LIKE 'foo 
%' or col ~ '^foo', but not col LIKE '%bar'."

Since my server does not use the C locale I created the index with

CREATE INDEX test_index
   ON item
   USING btree
   (url varchar_pattern_ops);

which works fine for queries like

  SELECT distinct url from item where url like 'http://www.micro%'  
limit 10;

explain analyze shows:
"Limit  (cost=9.53..9.54 rows=1 width=34) (actual time=80.809..80.856  
rows=10 loops=1)"
"  ->  Unique  (cost=9.53..9.54 rows=1 width=34) (actual  
time=80.806..80.835 rows=10 loops=1)"
"        ->  Sort  (cost=9.53..9.53 rows=1 width=34) (actual  
time=80.802..80.812 rows=11 loops=1)"
"              Sort Key: url"
"              Sort Method:  quicksort  Memory: 306kB"
"              ->  Index Scan using test_index on item   
(cost=0.00..9.52 rows=1 width=34) (actual time=0.030..6.165 rows=2254  
loops=1)"
"                    Index Cond: (((url)::text ~>=~ 'http:// 
www.micro'::text) AND ((url)::text ~<~ 'http://www.micrp'::text))"
"                    Filter: ((url)::text ~~ 'http://www.micro%'::text)"
"Total runtime: 80.908 ms"

which is great but if I run the query with the subselect it uses a  
sequence scan:

select *
  from result
  where exists
    (select * from item where item.url LIKE result.url || '%' limit 1)  
limit 10;

"Limit  (cost=0.00..96.58 rows=10 width=36) (actual  
time=12.660..35295.928 rows=10 loops=1)"
"  ->  Seq Scan on result  (cost=0.00..93886121.77 rows=9721314  
width=36) (actual time=12.657..35295.906 rows=10 loops=1)"
"        Filter: (subplan)"
"        SubPlan"
"          ->  Limit  (cost=0.00..4.81 rows=1 width=42) (actual  
time=2715.061..2715.061 rows=1 loops=13)"
"                ->  Seq Scan on item  (cost=0.00..109589.49  
rows=22781 width=42) (actual time=2715.055..2715.055 rows=1 loops=13)"
"                      Filter: ((url)::text ~~ (($0)::text ||  
'%'::text))"
"Total runtime: 35295.994 ms"


The only explaination is that I don't use a constant when comparing  
the values. But actually it is a constant...


any help?

using postgres 8.3.3 on ubuntu.



Cheers,

moritz


Responses

pgsql-performance by date

Next:From: Steinar H. GundersonDate: 2008-06-28 19:19:31
Subject: Re: Planner should use index on a LIKE 'foo%' query
Previous:From: Tom LaneDate: 2008-06-28 15:53:43
Subject: Re: Subquery WHERE IN or WHERE EXISTS faster?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group