Re: regexp_replace 'g' flag

From: David Johnston <polobo(at)yahoo(dot)com>
To: pgsql-docs(at)postgresql(dot)org
Subject: Re: regexp_replace 'g' flag
Date: 2013-09-06 14:53:48
Message-ID: 1378479228211-5769912.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs

Sorry if you get this twice but I use Nabble and didn't subscribe to the list
so my originals got put into the verification queue. I've subscribed now
and am re-posting hoping it will go through clean.

See my self-quote comment and my direct comment at the end.

David Johnston wrote
>
> Tom Lane-2 wrote
>> Bruce Momjian &lt;

>> bruce@

>> &gt; writes:
>>> On Thu, Sep 5, 2013 at 08:37:44PM -0400, Bruce Momjian wrote:
>>>> Why doesn't the 'g' flag appear in this table?
>>>> http://www.postgresql.org/docs/9.2/static/functions-matching.html#POSIX-EMBEDDED-OPTIONS-TABLE
>>
>>> Is it because the table has generic pattern modififers and 'g' only is
>>> relevant for regexp_replace? I assume so.
>>
>> The table is specifically about ARE options, and 'g' is *not* one of
>> those. Adding 'g' to the table would be wrong.
>>
>> It does seem to me to be a bit confusing that the text description of
>> substring() mentions 'i' and 'g' explicitly, when only 'i' is listed in
>> the table. You could make a case for phrasing along the line of
>> "substring() supports the 'g' flag that specifies ..., as well as all the
>> flags listed in Table 9-19". On the other hand, 'i' is the most useful
>> of
>> the flags listed in the table by several country miles, and it doesn't
>> seem quite right to make people go off and consult the table to find out
>> about it.
>>
>> Not sure whether there's any real improvement that can be made here,
>> but I suppose it'd be nice if the text descriptions of substring() and
>> regexp_replace() handled this matter in the same way ...
>>
>> regards, tom lane
> substring(text from pattern) returns a scalar text which corresponds to
> either the entire first match found or the sub-portion of the first match
> corresponding to the first (and only first if more than one) matching
> group in the expression. It cannot act globally and so cannot accept/use
> a "g" flag even if there was some way to provide it.
>
> regexp_replace indeed handles a "g" flag because while it too returns a
> scalar text it returns the entire source string post-modification as
> opposed to only a subset thereof and the modification itself makes use of
> the "g" flag to decide whether to replace one or ALL occurrences.
>
> I cannot find where "the text description of substring() mentions 'i' and
> 'g' explicitly"; could you maybe copy-paste a direct quote and also note
> the exaction section of the page you are looking in?
>
> David J.

A little bit rambly but hopefully instructive...

"embedded" is the key word here. Although not applicable to PostgreSQL an
embedded modifier alters the interpretation of the pattern between the
"start" and "end" modifier expression (for PostgreSQL there is only a
"start", no end, and so the embedded modifier affects the entire pattern).
While it is possible to turn on/off case insensitivity, .-newline, and some
other options the "g" (global) option can only apply to the pattern as a
whole and conceptually belongs to the executor of the pattern as opposed to
the pattern itself.

The "g" option is relevant to both "regexp_replace" and "regexp_matches".
In the later case using the "g" modifier allows for more than one row to be
returned from the SRF. In both cases the entire pattern is being applied to
the input text and the "g" modifier tells the matching algorithm not only to
affirm there is at least one match but to identify all sections of the
source text that match the entire pattern.

PostgreSQL is somewhat more limited in using these embedded options than
other implementations since, IIRC (and my quick scan of the linked documents
just now), you can only begin the pattern with these and so they apply to
the entire pattern too. Basically they provide a way to include flags in
the pattern when dealing with operator-based invocation. In other
implementations it is possible to write something like:

'(?i)this section is case insensitive(?-i)this section is case sensitive'

namely toggling these on/off within a pattern.

Since the "g"lobal flag only makes sense in function-call invocations it is
not needed nor useful to have embedded within the expression itself. i.e.,
operator-based invocations only deal with 'true/false' evaluations which is
a one+-or-none evaluation.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/regexp-replace-g-flag-tp5769814p5769912.html
Sent from the PostgreSQL - docs mailing list archive at Nabble.com.

In response to

Browse pgsql-docs by date

  From Date Subject
Next Message Bruce Momjian 2013-09-07 18:01:52 Re: Re: Privileges for INFORMATION_SCHEMA.SCHEMATA (was Re: [DOCS] Small clarification in "34.41. schemata")
Previous Message Tom Lane 2013-09-06 01:59:13 Re: regexp_replace 'g' flag