Planner should use index on a LIKE 'foo%' query

From: Moritz Onken <onken(at)houseofdesign(dot)de>
To: pgsql-performance(at)postgresql(dot)org
Subject: Planner should use index on a LIKE 'foo%' query
Date: 2008-06-28 16:24:42
Message-ID: 287E4428-BECC-46D6-BC47-EA94EA0121F0@houseofdesign.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

I have a query

select count(*)
from result
where exists
(select * from item where item.url LIKE result.url || '%' limit 1);

which basically returns the number of items which exist in table
result and match a URL in table item by its prefix.
I read all about idexes (http://www.postgresql.org/docs/8.3/static/indexes-types.html
) and especially this part:
"The optimizer can also use a B-tree index for queries involving the
pattern matching operators LIKE and ~ if the pattern is a constant and
is anchored to the beginning of the string — for example, col LIKE 'foo
%' or col ~ '^foo', but not col LIKE '%bar'."

Since my server does not use the C locale I created the index with

CREATE INDEX test_index
ON item
USING btree
(url varchar_pattern_ops);

which works fine for queries like

SELECT distinct url from item where url like 'http://www.micro%'
limit 10;

explain analyze shows:
"Limit (cost=9.53..9.54 rows=1 width=34) (actual time=80.809..80.856
rows=10 loops=1)"
" -> Unique (cost=9.53..9.54 rows=1 width=34) (actual
time=80.806..80.835 rows=10 loops=1)"
" -> Sort (cost=9.53..9.53 rows=1 width=34) (actual
time=80.802..80.812 rows=11 loops=1)"
" Sort Key: url"
" Sort Method: quicksort Memory: 306kB"
" -> Index Scan using test_index on item
(cost=0.00..9.52 rows=1 width=34) (actual time=0.030..6.165 rows=2254
loops=1)"
" Index Cond: (((url)::text ~>=~ 'http://
www.micro'::text) AND ((url)::text ~<~ 'http://www.micrp'::text))"
" Filter: ((url)::text ~~ 'http://www.micro%'::text)"
"Total runtime: 80.908 ms"

which is great but if I run the query with the subselect it uses a
sequence scan:

select *
from result
where exists
(select * from item where item.url LIKE result.url || '%' limit 1)
limit 10;

"Limit (cost=0.00..96.58 rows=10 width=36) (actual
time=12.660..35295.928 rows=10 loops=1)"
" -> Seq Scan on result (cost=0.00..93886121.77 rows=9721314
width=36) (actual time=12.657..35295.906 rows=10 loops=1)"
" Filter: (subplan)"
" SubPlan"
" -> Limit (cost=0.00..4.81 rows=1 width=42) (actual
time=2715.061..2715.061 rows=1 loops=13)"
" -> Seq Scan on item (cost=0.00..109589.49
rows=22781 width=42) (actual time=2715.055..2715.055 rows=1 loops=13)"
" Filter: ((url)::text ~~ (($0)::text ||
'%'::text))"
"Total runtime: 35295.994 ms"

The only explaination is that I don't use a constant when comparing
the values. But actually it is a constant...

any help?

using postgres 8.3.3 on ubuntu.

Cheers,

moritz

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Steinar H. Gunderson 2008-06-28 19:19:31 Re: Planner should use index on a LIKE 'foo%' query
Previous Message Tom Lane 2008-06-28 15:53:43 Re: Subquery WHERE IN or WHERE EXISTS faster?