From: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com> |
---|---|
To: | Thorsten Glaser <t(dot)glaser(at)tarent(dot)de> |
Cc: | "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org> |
Subject: | Re: BUG #14628: regex description in online documentation misleadingly/wrong |
Date: | 2017-04-20 16:02:08 |
Message-ID: | CAKFQuwawUKoqCMTj1AUcr0tfQdzOekL5VwTXT5M9zHNfN-RWZQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Thu, Apr 20, 2017 at 8:25 AM, <t(dot)glaser(at)tarent(dot)de> wrote:
> The following bug has been logged on the website:
>
> Bug reference: 14628
> Logged by: Thorsten Glaser
> Email address: t(dot)glaser(at)tarent(dot)de
> PostgreSQL version: 9.6.1
> Operating system: GNU/Linux
> Description:
>
> https://www.postgresql.org/docs/9.6/static/functions-
> matching.html#FUNCTIONS-POSIX-REGEXP
> clearly says that ~ matches a POSIX regular expression.
>
> This is only somewhat true: this does match:
>
>
Based on what you wrote below I'd maybe (though leaning toward not) modify
the chapter title to "POSIX (ARE) Regular Expressions"
I would then likely add two more sentences before Table 9-14 (before the
existing intro sentence).
POSIX regular expressions come in multiple flavors, of which PostgreSQL
uses ARE by default. Further information on these flavors is presented in
the first subsection, "Regular Expression Details", below. What follows is
an overview of the general mechanics involved with any regular expression.
> The cause is likely this statement, burrowed way down in another chapter:
> “Note: PostgreSQL always initially presumes that a regular expression
> follows the ARE rules.”
>
>
While this maybe could be improved the above characterization seems
overblown. 9.7.3.1 is a sub-section of 9.7.3 so "[buried] way down" isn't
accurate. That we choose to provide the high-level conceptual overview of
regular expressions first, and then delve into ARE/BRE/ERE has caused few
or no complaints from the typical reader for whom the defaults are adequate
and they just want to know how to get things to work in the simple case.
> And indeed, it’s an ARE!
>
> tarent=> SELECT 'a\b' ~ '(?e)^[a\b]*$';
> ?column?
> ----------
> t
> (1 row)
>
>
> I find this extremely misleading (it also does not state whether it matches
> BRE or ERE by default, just “POSIX re”),
You missed the big bubble note in 9.7.3.1: "PostgreSQL always initially
presumes that a regular expression follows the ARE rules".
> especially as it’s extremely
> important to know precisely what RE syntax you’re targetting when escaping
> a
> user-provided string into part of a RE (you have to precisely know where to
> escape and where to not escape, for example),
I'd say that is advanced usage and as you were able to find the needed
documentation in 9.7.3.1 I'm not sure there is anything to fix based upon
this.
> which is why I personally
> always use POSIX standard RE (normally BRE).
>
So basically you feel its necessary for us to redundantly emphasize the
fact that we default to ARE because its different from your default choice
and, you imply but do not support, the choice of the majority of other
regular expression implementations. If one wants to understand the regular
expression implementation they read 9.7.3 - in all other places we can just
call them regular expressions. Now, as I note below, if you have specific
areas that you think need to be fixed please point them out.
> Please indicate in *all* places in the documentation dealing with regular
> expressions that it’s about ARE and link ARE to the section in the manual
> explaining it -
> https://www.postgresql.org/docs/9.6/static/functions-
> matching.html#POSIX-SYNTAX-DETAILS
> - in all of those places. Also, make clear at the beginning of that section
> how to force standard POSIX RE (i.e. BRE and ERE).
>
You seem to have a very firm grasp of the topic and so might consider some
actual firm suggestions and/or a patch. I've not seen an actual factual
omission or error in all of this and while I firmly believe that
documentation can always be improved, and that the TCL implementation that
we use has its quirks, I don't foresee the requested surgery happening from
scratch based upon this report. I've suggested a fairly easy clarification
at the top of the chapter (9.7.3) to at least bring immediate awareness of
the flavor issue. Does that work for you?
David J.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2017-04-20 16:37:50 | Re: BUG #14628: regex description in online documentation misleadingly/wrong |
Previous Message | t.glaser | 2017-04-20 15:25:49 | BUG #14628: regex description in online documentation misleadingly/wrong |