diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 2b4fe0cb59..f8ced55daa 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -5970,6 +5970,138 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); + + Differences From XQuery (<literal>LIKE_REGEX</literal>) + + + Since SQL:2008, the SQL standard includes + a LIKE_REGEX operator that performs pattern + matching according to the XQuery regular expression + standard. PostgreSQL does not yet + implement this operator, but you can get very similar behavior using + the regexp_match() function. + + + + Notable differences between the existing POSIX-based + regular-expression feature and XQuery regular expressions include: + + + + + XQuery character class subtraction is not supported. An example of + this feature is using the following to match only English + consonants: [a-z-[aeiou]]. + + + + + XQuery allows a literal character in the pattern to be written as + an HTML-style Unicode character reference, for + instance &#NNNN;. + This is not supported by POSIX, but you can get the same effect by + writing \uNNNN. (The + equivalence is only exact when the database encoding is UTF-8.) + + + + + The SQL standard (not XQuery itself) attempts to cater for more + variants of newline than POSIX does. The + newline-sensitive matching options described above consider only + ASCII NL (\n) to be a newline, but SQL would have + us treat CR (\r), CRLF (\r\n) + (a Windows-style newline), and some Unicode-only characters like + LINE SEPARATOR (U+2028) as newlines as well. + Notably, . and \s should + count \r\n as one character not two according to + SQL. + + + + + XQuery character class shorthands \c, + \C, \i, + and \I are not supported. + + + + + XQuery character class elements + using \p{UnicodeProperty} or the + inverse \P{UnicodeProperty} are not supported. + + + + + POSIX interprets character classes such as \w + (see ) + according to the prevailing locale (which you can control by + attaching a COLLATE clause to the operator or + function). XQuery specifies these classes by reference to Unicode + character properties, so equivalent behavior is obtained only with + a locale that follows the Unicode rules. + + + + + Of the character-entry escapes described in + , + XQuery supports only \n, \r, + and \t. + + + + + XQuery does not support + the [:name:] syntax + for character classes within bracket expressions. + + + + + XQuery does not have lookahead or lookbehind constraints, + nor any of the constraint escapes described in + . + + + + + The metasyntax forms described in + do not exist in XQuery. + + + + + The regular expression flag letters defined by XQuery are + related to but not the same as the option letters for POSIX + (). While the + i and q options behave the + same, others do not. + + + XQuery's s (allow dot to match newline) + and m (allow ^ + and $ to match at newlines) flags provide access + to the same behaviors as POSIX's n, + p and w flags, but + do not match the behavior of + POSIX's s and m flags. + Note in particular that dot-matches-newline is the default behavior + in POSIX but not XQuery. + + + Also, XQuery's x (ignore whitespace in pattern) + flag is noticeably different from POSIX's expanded-mode flag. + POSIX's x flag also allows # to + begin a comment in the pattern, and POSIX will not ignore a + whitespace character after a backslash. + + + + + + @@ -11793,6 +11925,14 @@ table2-mapping + + + + There are minor differences in the interpretation of regular + expression patterns used in like_regex filters, as + described in . + + @@ -11872,6 +12012,36 @@ table2-mapping + + Regular Expressions + + + SQL/JSON path expressions allow matching text to a regular expression + with the like_regex filter. For example, the + following SQL/JSON path query would case-insensitively match all + strings in an array that start with an English vowel: + +'$[*] ? (@ like_regex "^[aeiou]" flag "i")' + + + + + The SQL/JSON standard borrows its definition for regular expressions + from the LIKE_REGEX operator, which in turn uses the + XQuery standard. PostgreSQL does not currently support the + LIKE_REGEX operator. Therefore, + the like_regex filter is implemented using the + POSIX regular expression engine described in + . This leads to various minor + discrepancies from standard SQL/JSON behavior, which are cataloged in + . + Note, however, that the flag-letter incompatibilities described there + do not apply to SQL/JSON, as it translates the XQuery flag letters to + match what the POSIX engine expects. + + + + SQL/JSON Path Operators and Methods @@ -12113,10 +12283,13 @@ table2-mapping like_regex - Tests pattern matching with POSIX regular expressions - (see ). Supported flags - are i, s, m, - x, and q. + Tests whether the first operand matches the regular expression + given by the second operand (see + ). + An optional flag string can be given. + Supported flags are i, m, + s, and q. + ["abc", "abd", "aBdC", "abdacb", "babc"] $[*] ? (@ like_regex "^ab.*c" flag "i") "abc", "aBdC", "abdacb"