Re: Relative performance of prefix and suffix string matching

From: "Stéphane A(dot) Schildknecht" <stephane(dot)schildknecht(at)postgresql(dot)fr>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Relative performance of prefix and suffix string matching
Date: 2011-09-23 10:53:55
Message-ID: 4E7C6543.5040501@postgresql.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Le 23/09/2011 12:30, Alban Hertroys a écrit :
> On 23 September 2011 11:47, Andrew Rose <andrew(dot)rose(at)metaswitch(dot)com
> <mailto:andrew(dot)rose(at)metaswitch(dot)com>> wrote:
>
> Basic Question: In text fields, is prefix matching significantly faster
> than suffix matching?
>
>
> It does depend on what type of index you use. BTrees split off text strings,
> from left to right, halving the number of records you need to scan at every
> branch. For a suffix match, that's exactly the wrong way around.
> Hash indexes probably don't fare any better.
> I don't know enough about GIST or GIN indexes to comment on their suitability
> for suffix matches, but presumably they're better at those.
>
> I recall doing suffix matches used to be a problem in at least earlier versions
> of Postgres, but it's quite possible that the query planner is smart enough to
> do the reverse match by itself nowadays (I doubt it, seeing that it would also
> need to reverse the way the index is organised).
>
>
> 2. Alternatively, I could store column 'rev_str' as a reversed version of
> column 'str' and have the client produce a reversed version of x on each
> query (call it r). Then the client would issue...
>
> SELECT * FROM tbl WHERE str LIKE 'x%' OR rev_str LIKE 'r%'
>
> ...which would use prefix matches only instead of requiring suffix matches.
> Since I've seen this form used by others, I was wondering if it's
> necessary - i.e. if databases really do perform prefix matching faster?
>
> 3. Is there a solution I'm unaware of with even better performance?
>
>
> You can create a functional index on the reverse of the string, that way
> omitting the need for an extra column (that needs updating as well).
>
> CREATE INDEX tbl_str_rev ON tbl (reverse(str));
> SELECT * FROM tbl WHERE str LIKE 'x%' OR reverse(str) LIKE 'x%';
>
> See: http://www.postgresql.org/docs/9.0/static/indexes-expressional.html

You can use the pg_trgm extension which lets you create gin or gist indexes on
your text field.

There is also wildspeed (see http://www.sai.msu.su/~megera/wiki/wildspeed).

Didn't try the latter solution, but the first one gives really great result for
searching partial strings.

a propos, there's one thing I'd like to know, is how to set the similarity
limit within pg_trgm on a server side (I'd like to have it settled to 0.2 for
every new session, for instance).

Regards,
--
Stéphane Schildknecht
http://www.loxodata.com
Contact régional PostgreSQL

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Kellerer 2011-09-23 11:40:35 Re: Problem with pg_upgrade 9.0 -> 9.1
Previous Message Albe Laurenz 2011-09-23 10:48:15 Re: Speed of lo_unlink vs. DELETE on BYTEA