Re: substring with this pattern works in 8.3.1; does not work in 8.3.4

From: Tomasz Ostrowski <tometzky(at)batory(dot)org(dot)pl>
To: Chris Wood <chris(dot)wood(at)bookitnow(dot)ca>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: substring with this pattern works in 8.3.1; does not work in 8.3.4
Date: 2008-11-20 09:30:57
Message-ID: 49252E51.7080605@batory.org.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 2008-11-12 19:07, Chris Wood wrote:

> here is where it works in 8.3.1:
> CONSTRAINT public_phone_ch CHECK (((public_phone = ''::text) OR
> ("substring"(public_phone, '^[0-9]{10}(,[0-9]{10})*$'::text) IS NOT NULL))));
> protocalte=> insert into locn values(10, '1231231234') ;

8.3.1 had a bug in substring function which was corrected in 8.3.2:

http://www.postgresql.org/docs/8.3/static/release-8-3-2.html
Fix a corner case in regular-expression substring matching
(substring(string from pattern)) (Tom)

The problem occurs when there is a match to the pattern overall but the
user has specified a parenthesized subexpression and that subexpression
hasn't got a match. An example is substring('foo' from 'foo(bar)?').
This should return NULL, since (bar) isn't matched, but it was
mistakenly returning the whole-pattern match instead (ie, foo).

> and here is where it does not work in 8.3.4:
> ERROR: new row for relation "locn" violates check constraint
> "public_phone_ch"

http://www.postgresql.org/docs/8.3/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP
The substring function with two parameters, substring(string from
pattern), provides extraction of a substring that matches a POSIX
regular expression pattern. It returns null if there is no match,
otherwise the portion of the text that matched the pattern. But *if*
*the* *pattern* *contains* *any* *parentheses*, *the* *portion* *of*
*the* *text* *that* *matched* *the* *first* *parenthesized*
*subexpression* (the one whose left parenthesis comes first) *is*
*returned*. You can put parentheses around the whole expression if you
want to use parentheses within it without triggering this exception.

So your check should look like this:
CONSTRAINT public_phone_ch
CHECK (
(
(public_phone = ''::text)
OR
("substring"(public_phone, '(^[0-9]{10}(,[0-9]{10})*$)'::text)
IS NOT NULL))
)
);

Or much simpler and clear:
CONSTRAINT public_phone_ch
CHECK ( public_phone ~ '^$|^[0-9]{10}(,[0-9]{10})*$' )

> I apologize in advance for not testing on 8.3.5, but that would be
> very difficult for me.

It is not that difficult:

PGVERSION=2.3.5
mkdir /tmp/postgres
cd /tmp/postgres
wget \
ftp://ftp.postgresql.org/pub/source/v$PGVERSION/postgresql-$PGVERSION.tar.bz2
tar xjf postgresql-$PGVERSION.tar.bz2
cd postgresql-$PGVERSION
./configure --prefix=/tmp/postgres
make install
cd /tmp/postgres/bin
./initdb --no-locale -D /tmp/postgres/data
./postgres -p 54320 -D /tmp/postgres/data &
./psql -p 54320 postgres

postgres=# select version();
PostgreSQL 8.3.5 on i686-pc-linux-gnu,
compiled by GCC gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33)

This works for any user (besides root) on any Unix-like system.

Regards
Tometzky
--
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
Winnie the Pooh

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexandra Nitzschke 2008-11-20 10:22:22 could not read block 77 of relation 1663/16385/388818775
Previous Message Heikki Linnakangas 2008-11-20 08:53:43 Re: BUG #4539: to_char(to_timestamp('...','....')) returns the wrong result