Skip site navigation (1) Skip section navigation (2)

Re: How to boost performance of queries containing pattern matching characters

From: Steve Atkins <steve(at)blighty(dot)com>
To: PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: How to boost performance of queries containing pattern matching characters
Date: 2011-02-14 16:43:24
Message-ID: 6A5D2BA2-000E-438F-B726-AFE3DD90FFE6@blighty.com (view raw or flat)
Thread:
Lists: pgsql-performance
On Feb 14, 2011, at 12:09 AM, Artur Zając wrote:

>> Looks like you've almost re-invented the trigram module:
>>  http://www.postgresql.org/docs/9.0/static/pgtrgm.html
> 
> I didn't know about this module.
> Idea to use three letters strings and use Full Text Search is the same, but
> the rest is not.
> 
> Is the idea to use similarity for this problem is really correct? How should
> be query constructed to return really all records matching ILIKE criteria?

If what you really want is the ability to select email addresses based on
subdomain, you might want to do this instead:

create email_domain_idx on mytable (reverse(lower(split_part(email, '@', 2))));

Then you can do things like this ...

delete from mytable where reverse(lower(split_part(email, '@', 2))) = reverse('aol.com');

... to delete all aol.com users or like this ...

delete from mytable where reverse(lower(split_part(email, '@', 2))) like reverse('%.aol.com');

... to delete all email addresses that are in a subdomain of "aol.com".

You need a reverse() function to do that. Here's one in plpgsql:

CREATE OR REPLACE FUNCTION reverse(text) RETURNS text AS '
DECLARE
       original alias for $1;
       reverse_str text;
       i int4;
BEGIN
 reverse_str = '''';
 FOR i IN REVERSE LENGTH(original)..1 LOOP
  reverse_str = reverse_str || substr(original,i,1);
 END LOOP;
 return reverse_str;
END;'
LANGUAGE 'plpgsql' IMMUTABLE;

(Normalizing the email address so that you store local part and domain part separately is even better, but an index on the reverse of the domain is still useful for working with subdomains).

Cheers,
  Steve


In response to

pgsql-performance by date

Next:From: Strange, John WDate: 2011-02-15 11:45:28
Subject: Checkpointing question
Previous:From: Greg SmithDate: 2011-02-14 15:16:27
Subject: Re: How to boost performance of queries containing pattern matching characters

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group