Re: Index used incorrectly with regular expressions on 7.4.6

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Antti Salmela <asalmela(at)iki(dot)fi>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Index used incorrectly with regular expressions on 7.4.6
Date: 2004-12-02 02:47:48
Message-ID: 1590.1101955668@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Antti Salmela <asalmela(at)iki(dot)fi> writes:
> Index is used incorrectly if constant part of the string ends with \d,

Yeah, you're right --- that code predates our use of the new regexp
engine, and it didn't know that escapes aren't simply quoted characters.

Now that I look at it, it's got a multibyte problem too :-(

If you need a patch right away, here's what I applied to 7.4 branch.

regards, tom lane

Index: selfuncs.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v
retrieving revision 1.147.2.3
diff -c -r1.147.2.3 selfuncs.c
*** selfuncs.c 27 Feb 2004 21:44:44 -0000 1.147.2.3
--- selfuncs.c 2 Dec 2004 02:35:48 -0000
***************
*** 3218,3223 ****
--- 3218,3225 ----
char *match;
int pos,
match_pos,
+ prev_pos,
+ prev_match_pos,
paren_depth;
char *patt;
char *rest;
***************
*** 3278,3288 ****

/* OK, allocate space for pattern */
match = palloc(strlen(patt) + 1);
! match_pos = 0;

/* note start at pos 1 to skip leading ^ */
! for (pos = 1; patt[pos]; pos++)
{
/*
* Check for characters that indicate multiple possible matches
* here. XXX I suspect isalpha() is not an adequately
--- 3280,3292 ----

/* OK, allocate space for pattern */
match = palloc(strlen(patt) + 1);
! prev_match_pos = match_pos = 0;

/* note start at pos 1 to skip leading ^ */
! for (prev_pos = pos = 1; patt[pos]; )
{
+ int len;
+
/*
* Check for characters that indicate multiple possible matches
* here. XXX I suspect isalpha() is not an adequately
***************
*** 3297,3302 ****
--- 3301,3314 ----
break;

/*
+ * In AREs, backslash followed by alphanumeric is an escape, not
+ * a quoted character. Must treat it as having multiple possible
+ * matches.
+ */
+ if (patt[pos] == '\\' && isalnum((unsigned char) patt[pos + 1]))
+ break;
+
+ /*
* Check for quantifiers. Except for +, this means the preceding
* character is optional, so we must remove it from the prefix
* too!
***************
*** 3305,3318 ****
patt[pos] == '?' ||
patt[pos] == '{')
{
! if (match_pos > 0)
! match_pos--;
! pos--;
break;
}
if (patt[pos] == '+')
{
! pos--;
break;
}
if (patt[pos] == '\\')
--- 3317,3329 ----
patt[pos] == '?' ||
patt[pos] == '{')
{
! match_pos = prev_match_pos;
! pos = prev_pos;
break;
}
if (patt[pos] == '+')
{
! pos = prev_pos;
break;
}
if (patt[pos] == '\\')
***************
*** 3322,3328 ****
if (patt[pos] == '\0')
break;
}
! match[match_pos++] = patt[pos];
}

match[match_pos] = '\0';
--- 3333,3346 ----
if (patt[pos] == '\0')
break;
}
! /* save position in case we need to back up on next loop cycle */
! prev_match_pos = match_pos;
! prev_pos = pos;
! /* must use encoding-aware processing here */
! len = pg_mblen(&patt[pos]);
! memcpy(&match[match_pos], &patt[pos], len);
! match_pos += len;
! pos += len;
}

match[match_pos] = '\0';

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2004-12-02 02:51:25 Re: lwlocks and starvation
Previous Message Tom Lane 2004-12-02 01:35:40 Re: New compile warnings for inheritance