Re: String searching

From: Jonathan Vanasco <postgres(at)2xlp(dot)com>
To: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: String searching
Date: 2014-11-18 01:54:38
Message-ID: 043CD98C-B050-4163-A044-A6035F388D15@2xlp.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On Nov 17, 2014, at 12:55 PM, Robert DiFalco wrote:

> SELECT * FROM MyTable WHERE upper(FullName) LIKE upper('%John%');
>
> That said, which would be the best extension module to use? A "gist" index on the uppercased column? Or something else? Thanks!

Performance wise, I think a function index would probably be the best:

CREATE INDEX mytable_lower_fullname_idx ON mytable(lower(fullname));

SELECT * FROM mytable WHERE lower(fullname) LIKE lower('%john%');

The only reason why I use `lower` and not `upper` is that it's easier to look at when dealing with debugging and sample queries.

I'd bench against GIN and GIST, but I think this will work the best.

The reason is that GIN/GIST use language patterns to simplify the index. so they work great on "words"

select plainto_tsquery('doing watching reading programming');
'watch' & 'read' & 'program'

but not so great on "names":

select plainto_tsquery('john doe');
'john' & 'doe'

select plainto_tsquery('jon doe');
'jon' & 'doe

So you'll get a bit more overhead on the match and you won't get a smaller index (which is why they're great for fulltext)

The search execution might turn out to be much faster. If so, i'd love to know. But doing a lower() search on a lower() function index has always been ridiculously fast for me.

This only goes for names though. If you're searching other fields, then another search method might be considerably better.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David G Johnston 2014-11-18 02:03:46 Re: String searching
Previous Message Tom Lane 2014-11-18 01:26:24 Re: count distinct slow?