Re: Relative performance of prefix and suffix string matching

From: Alban Hertroys <haramrae(at)gmail(dot)com>
To: Andrew Rose <andrew(dot)rose(at)metaswitch(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Relative performance of prefix and suffix string matching
Date: 2011-09-23 10:30:17
Message-ID: CAF-3MvN89TbDVMEE7uy8gfLuqShieCYUNQPSC4CzJOzYp88s3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 23 September 2011 11:47, Andrew Rose <andrew(dot)rose(at)metaswitch(dot)com> wrote:

> Basic Question: In text fields, is prefix matching significantly faster
> than suffix matching?
>

It does depend on what type of index you use. BTrees split off text strings,
from left to right, halving the number of records you need to scan at every
branch. For a suffix match, that's exactly the wrong way around.
Hash indexes probably don't fare any better.
I don't know enough about GIST or GIN indexes to comment on their
suitability for suffix matches, but presumably they're better at those.

I recall doing suffix matches used to be a problem in at least earlier
versions of Postgres, but it's quite possible that the query planner is
smart enough to do the reverse match by itself nowadays (I doubt it, seeing
that it would also need to reverse the way the index is organised).

> 2. Alternatively, I could store column 'rev_str' as a reversed version of
> column 'str' and have the client produce a reversed version of x on each
> query (call it r). Then the client would issue...
>
> SELECT * FROM tbl WHERE str LIKE 'x%' OR rev_str LIKE 'r%'
>
> ...which would use prefix matches only instead of requiring suffix matches.
> Since I've seen this form used by others, I was wondering if it's necessary
> - i.e. if databases really do perform prefix matching faster?
>
> 3. Is there a solution I'm unaware of with even better performance?
>

You can create a functional index on the reverse of the string, that way
omitting the need for an extra column (that needs updating as well).

CREATE INDEX tbl_str_rev ON tbl (reverse(str));
SELECT * FROM tbl WHERE str LIKE 'x%' OR reverse(str) LIKE 'x%';

See: http://www.postgresql.org/docs/9.0/static/indexes-expressional.html

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ingmar Brouns 2011-09-23 10:31:49 Re: Query performs badly with materialize node
Previous Message Tore Halvorsen 2011-09-23 10:27:28 Re: Relative performance of prefix and suffix string matching