Re: LIKE search and performance

From: PFC <lists(at)peufeu(dot)com>
To: "Mark Lewis" <mark(dot)lewis(at)mir3(dot)com>, "James Mansion" <james(at)mansionfamily(dot)plus(dot)com>
Cc: "Magnus Hagander" <magnus(at)hagander(dot)net>, "Alexander Staubo" <alex(at)purefiction(dot)net>, Andy <frum(at)ar-sd(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: LIKE search and performance
Date: 2007-05-24 22:09:15
Message-ID: op.tsuqhpzycigqcu@apollo13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


> PG could scan the index looking for matches first and only load the
> actual rows if it found a match, but that could only be a possible win
> if there were very few matches, because the difference in cost between a
> full index scan and a sequential scan would need to be greater than the
> cost of randomly fetching all of the matching data rows from the table
> to look up the visibility information.

If you need to do that kind of thing, ie. seq scanning a table checking
only one column among a large table of many columns, then don't use an
index. An index, being a btree, needs to be traversed in order (or else, a
lot of locking problems come up) which means some random accesses.

So, you could make a table, with 2 columns, updated via triggers : your
text field, and the primary key of your main table. Scanning that would be
faster.

Still, a better solution for searching in text is :

- tsearch2 if you need whole words
- trigrams for any substring match
- xapian for full text search with wildcards (ie. John* = Johnny)

Speed-wise those three will beat any seq scan on a large table by a huge
margin.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Richard Huxton 2007-05-25 08:13:25 Re: LIKE search and performance
Previous Message Craig James 2007-05-24 22:08:16 Re: LIKE search and performance