From: | Artur Zając <azajac(at)ang(dot)com(dot)pl> |
---|---|
To: | <gnanam(at)zoniac(dot)com>, "'Richard Huxton'" <dev(at)archonet(dot)com> |
Cc: | <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: How to boost performance of queries containing pattern matching characters |
Date: | 2011-02-14 07:38:48 |
Message-ID: | 003801cbcc1a$37d2f7e0$a778e7a0$@ang.com.pl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
>How can we boost performance of queries containing pattern matching
>characters? In my case, we're using a percent sign (%) that matches any
string of zero or more characters.
>
> QUERY: DELETE FROM MYTABLE WHERE EMAIL ILIKE '%domain.com%'
>
> EMAIL column is VARCHAR(256).
>
> As it is clear from the above query, email is matched "partially and
case-insensitively", which my application requirement demands.
>
> In case, if it were a full match, I could easily define a functional
> INDEX on EMAIL column (lower(EMAIL)) and I could rewrite my DELETE where
criteria like lower(EMAIL) = 'someemail(at)domain(dot)com'.
>
> MYTABLE currently contains 2 million records and grows consistently.
I had almost the same problem.
To resolve it, I created my own text search parser (myftscfg) which divides
text in column into three letters parts, for example:
someemail(at)domain(dot)com is divided to som, ome,mee,eem,ema,mai,ail,il@,
l(at)d,@do,dom,oma,mai,ain,in.,n.c,.co,com
There should be also index on email column:
CREATE INDEX "email _fts" on mytable using gin
(to_tsvector('myftscfg'::regconfig, email))
Every query like email ilike '%domain.com%' should be rewrited to:
WHERE
to_tsvector('myftscfg',email) @@ to_tsquery('dom') AND
to_tsvector('myftscfg',email) @@ to_tsquery('oma') AND
to_tsvector('myftscfg',email) @@ to_tsquery('mai') AND
to_tsvector('myftscfg',email) @@ to_tsquery('ain') AND
to_tsvector('myftscfg',email) @@ to_tsquery('in.') AND
to_tsvector('myftscfg',email) @@ to_tsquery('n.c') AND
to_tsvector('myftscfg',email) @@ to_tsquery('.co') AND
to_tsvector('myftscfg',email) @@ to_tsquery('com') AND email ILIKE
'%domain.com%';
Index is reducing number of records and clause email ILIKE '%domain.com%' is
selecting only valid records.
I didn't found better solution.
-------------------------------------------
Artur Zajac
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Huxton | 2011-02-14 07:39:33 | Re: How to boost performance of queries containing pattern matching characters |
Previous Message | Gnanakumar | 2011-02-14 07:28:23 | Re: How to boost performance of queries containing pattern matching characters |