Skip site navigation (1) Skip section navigation (2)

Re: LIKE search and performance

From: "Alexander Staubo" <alex(at)purefiction(dot)net>
To: Andy <frum(at)ar-sd(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: LIKE search and performance
Date: 2007-05-23 16:05:26
Message-ID: 88daf38c0705230905u1a981102k691b47bd36011a80@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
On 5/23/07, Andy <frum(at)ar-sd(dot)net> wrote:
> An example would be:
> SELECT * FROM table
>                              WHERE name like '%john%' or street like '%srt%'
>
> Anyway, the query planner always does seq scan on the whole table and that
> takes some time. How can this be optimized or made in another way to be
> faster?

There's no algorithm in existence that can "index" arbitrary
substrings the way you think. The only rational way to accomplish this
is to first break the text into substrings using some algorithm (eg.,
words delimited by whitespace and punctuation), and index the
substrings individually.

You can do this using vanilla PostgreSQL, and you can use Tsearch2
and/or its GIN indexes to help speed up the searches. The simplest
solution would be to put all the substrings in a table, one row per
substring, along with an attribute referencing the source table/row.

At this point you have effectively reduced your search space: you can
use a query to isolate the words in your "dictionary" that contain the
substrings. So a query might be:

  select ... from persons where id in (
    select person_id from person_words
    where word like '%john%';
  )

The "like" search, even though it will use a sequential scan, is bound
to be much faster on these small words than searching for substrings
through large gobs of text in the persons table.

Note that PostgreSQL *can* exploit the index for *prefix* matching, if
you tell it to use the right operator class:

  create index persons_name_index on persons (name text_pattern_ops);

or, if you're using varchars (though there is rarely any reason to):

  create index persons_name_index on persons (name varchar_pattern_ops);

(These two may be identical internally. Check the docs.)

Now you can do:

  select ... from persons where name like 'john%';

which will yield a query plan such as this:

 Index Scan using persons_name_index on persons  (cost=0.00..8.27
rows=1 width=29) (actual time=0.184..0.373 rows=51 loops=1)
   Index Cond: ((name ~>=~ 'john'::text) AND (name ~<~ 'joho'::text))
   Filter: (title ~~ 'john%'::text)

Alexander.

In response to

Responses

pgsql-performance by date

Next:From: Y SidhuDate: 2007-05-23 16:22:22
Subject: max_fsm_pages, shared_buffers and checkpoint_segments
Previous:From: Guido NeitzerDate: 2007-05-23 16:00:18
Subject: Re: LIKE search and performance

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group