Quick Links

Regex matching where text is input and regex stored in column

From:	Achilleas Mantzios - cloud <a(dot)mantzios(at)cloud(dot)gatewaynet(dot)com>
To:	pgsql-sql(at)postgresql(dot)org
Subject:	Regex matching where text is input and regex stored in column
Date:	2023-09-15 10:52:52
Message-ID:	351abe02-57a5-a219-a789-57f3b6079f73@cloud.gatewaynet.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

Hello

I have a table like :

amantzio(at)sma/dynacom=# \d mail_vessel_addressbook
                                   Table "public.mail_vessel_addressbook"
Column | Type   | Collation | Nullable |
                             Default
---------+---------+-----------+----------+--------------------------------------------------------------------

.....

where address stores basically email patterns, human / edited text with
possible wild cards, meant to be input mainly by our admins. There is an
query that checks a particular new address (this runs via an exim4 mail
server) which has become quite slow :

SELECT text('news3(at)newsaccess(dot)in') from mail_global_addressbook where
text('news3(at)newsaccess(dot)in') ~*
replace(replace(replace(address,'*','.*'),'+','.+'),'?','.?') UNION SELE
CT text('news3(at)newsaccess(dot)in') from mail_vessel_addressbook where
text('news3(at)newsaccess(dot)in') ~*
replace(replace(replace(address,'*','.*'),'+','.+'),'?','.?') LIMIT 1;

It basically seq scans both tables, and the 2nd table in particular is
fairly big : 123966 rows. So the above query tends to take 3.6 secs on
average to execute. A pg_trgm index would not help since there is no way
(that I know of) to index a regex in pgsql. Most common use is about
indexing values to be checked against a given regex, quite the reverse
from what we need here.

I got great improvement by materializing
replace(replace(replace(address,'*','.*'),'+','.+'),'?','.?'), we are
running pgsql 10.23 :( , no generated columns possible, so I used a
trigger, and then adding a normal b-tree index on the column. This
causes Index only scan which is much better than the seq scan. Also I
had to re-write the query as :

with

qry1 as

(SELECT text('watchkeepers(at)ukmto(dot)org') from mail_vessel_addressbook
where

text('watchkeepers(at)ukmto(dot)org') ~*
replace(replace(replace(address,'*','.*'),'+','.+'),'?','.?')

LIMIT 1),
qry2 as

(SELECT text('watchkeepers(at)ukmto(dot)org') from mail_global_addressbook
where

text('watchkeepers(at)ukmto(dot)org') ~*
replace(replace(replace(address,'*','.*'),'+','.+'),'?','.?')

LIMIT 1)
select text('watchkeepers(at)ukmto(dot)org') from qry1 UNION select
text('watchkeepers(at)ukmto(dot)org') from qry2 LIMIT 1;

I was wondering if there is some more elegant and better solution,
without the need for extra column and trigger.

Responses

Re: Regex matching where text is input and regex stored in column at 2023-09-16 16:35:05 from Achilleas Mantzios

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Jean-Marc Voillequin (MA)	2023-09-16 09:26:29	no_data_found oracle vs pg
Previous Message	Mehmet Sabri KUNT	2023-09-14 08:11:30	Re: Query with conditional statement