Re: How to boost performance of queries containing pattern matching characters

From: Shaun Thomas <sthomas(at)peak6(dot)com>
To: "gnanam(at)zoniac(dot)com" <gnanam(at)zoniac(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: How to boost performance of queries containing pattern matching characters
Date: 2011-02-14 13:55:36
Message-ID: 4D593458.1020004@peak6.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 02/14/2011 12:59 AM, Gnanakumar wrote:

> QUERY: DELETE FROM MYTABLE WHERE EMAIL ILIKE '%domain.com%'
> EMAIL column is VARCHAR(256).

Honestly? You'd be better off normalizing this column and maybe hiding
that fact in a view if your app requires email as a single column. Split
it like this:

So user(at)gmail(dot)com becomes:

email_acct (user)
email_domain (gmail)
email_tld (com)

This would let you drop the first % on your like match and then
traditional indexes would work just fine. You could also differentiate
between domains with different TLDs without using wildcards, which is
always faster.

I might ask why you are checking email for wildcards after the TLD in
the first place. Is it really so common you are trying to match .com,
.com.au, .com.bar.baz.edu, or whatever? At the very least, splitting the
account from the domain+tld would be beneficial, as it would remove the
necessity of the first wildcard, which is really what's hurting you.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas(at)peak6(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2011-02-14 15:16:27 Re: How to boost performance of queries containing pattern matching characters
Previous Message Heikki Linnakangas 2011-02-14 12:30:03 Re: Field wise checking the performance.