Re: add function argument names to regex* functions.

From: "Dian Fay" <di(at)nmfay(dot)com>
To: "jian he" <jian(dot)universality(at)gmail(dot)com>, "Jim Nasby" <jim(dot)nasby(at)gmail(dot)com>
Cc: "Peter Eisentraut" <peter(at)eisentraut(dot)org>, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: add function argument names to regex* functions.
Date: 2024-01-08 00:44:55
Message-ID: CY8WQVZ0PK1B.19QRO61WBALCP@nmfay.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu Jan 4, 2024 at 2:03 AM EST, jian he wrote:
> On Thu, Jan 4, 2024 at 7:26 AM Jim Nasby <jim(dot)nasby(at)gmail(dot)com> wrote:
> >
> > On 1/3/24 5:05 PM, Dian Fay wrote:
> >
> > Another possibility is `index`, which is relatively short and not a
> > reserved keyword ^1. `position` is not as precise but would avoid the
> > conceptual overloading of ordinary indices.
> >
> > I'm not a fan of "index" since that leaves the question of
> > whether it's 0 or 1 based. "Position" is a bit better, but I think
> > Jian's suggestion of "occurance" is best.
> >
> > We do have precedent for one-based `index` in Postgres: array types are
> > 1-indexed by default! "Occurrence" removes that ambiguity but it's long
> > and easy to misspell (I looked it up after typing it just now and it
> > _still_ feels off).
> >
> > How's "instance"?
> >
> > Presumably someone referencing arguments by name would have just looked up the names via \df or whatever, so presumably misspelling wouldn't be a big issue. But I think "instance" is OK as well.
> >
> > --
> > Jim Nasby, Data Architect, Austin TX
>
> regexp_instr: It has the syntax regexp_instr(string, pattern [, start
> [, N [, endoption [, flags [, subexpr ]]]]])
> oracle:
> REGEXP_INSTR (source_char, pattern, [, position [, occurrence [,
> return_opt [, match_param [, subexpr ]]]]] )
>
> "string" and "source_char" are almost the same descriptive, so maybe
> there is no need to change.
> "start" is better than "position", imho.
> "return_opt" is better than "endoption", (maybe we need change, for
> now I didn't)
> "flags" cannot be changed to "match_param", given it quite everywhere
> in functions-matching.html.
>
> similarly for function regexp_replace, oracle using "repplace_string",
> we use "replacement"(mentioned in the doc).
> so I don't think we need to change to "repplace_string".
>
> Based on how people google[0], I think `occurrence` is ok, even though
> it's verbose.
> to change from `N` to `occurrence`, we also need to change the doc,
> that is why this patch is more larger.
>
>
> [0]: https://www.google.com/search?q=regex+nth+match&oq=regex+nth+match&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIGCAEQRRg8MgYIAhBFGDzSAQc2MThqMGo5qAIAsAIA&sourceid=chrome&ie=UTF-8

The `regexp_replace` summary in table 9.10 is mismatched and still
specifies the first parameter name as `string` instead of `source`.
Since all the other functions use `string`, should `regexp_replace` do
the same or is this a case where an established "standard" diverges?

I noticed the original documentation for some of these functions is
rather disorganized; summaries explain `occurrence` without explaining
the prior `start` parameter, and detailed documentation in 9.7 is
usually a single paragraph per function running pell-mell through ifs
and buts without section headings, so entries in table 9.10 have to
reference the entire section 9.7.3 instead of their specific functions.
It's out of scope here, but should I bring this up on pgsql-docs?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2024-01-08 02:41:01 Re: the s_lock_stuck on perform_spin_delay
Previous Message Michael Paquier 2024-01-08 00:16:19 Re: Add a perl function in Cluster.pm to generate WAL