Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

From: Gilles Darold <gilles(at)darold(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Chapman Flack <chap(at)anastigmatix(dot)net>, er(at)xs4all(dot)nl, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date: 2021-08-02 21:22:04
Message-ID: 672983ad-86de-9063-acc1-a1f85ef0e14d@darold.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le 02/08/2021 à 01:21, Tom Lane a écrit :
> Gilles Darold <gilles(at)darold(dot)net> writes:
>> [ v5-0001-regexp-foo-functions.patch ]
> I've gone through this whole patch now, and found quite a lot that I did
> not like. In no particular order:
>
> * Wrapping parentheses around the user's regexp doesn't work. It can
> turn an invalid regexp into a valid one: for example 'a)(b' should draw
> a syntax error. With this patch, no error would be thrown, but the
> "outer" parens wouldn't do what you expected. Worse, it can turn a
> valid regexp into an invalid one: the metasyntax options described in
> 9.7.3.4 only work at the start of the regexp. So we have to handle
> whole-regexp cases honestly rather than trying to turn them into an
> instance of the parenthesized-subexpression case.
>
> * You did a lot of things quite inefficiently, apparently to avoid
> touching any existing code. I think it's better to extend
> setup_regexp_matches() and replace_text_regexp() a little bit so that
> they can support the behaviors these new functions need. In both of
> them, it's absolutely trivial to allow a search start position to be
> passed in; and it doesn't take much to teach replace_text_regexp()
> to replace only the N'th match.
>
> * Speaking of N'th, there is not much of anything that I like
> about Oracle's terminology for the function arguments, and I don't
> think we ought to adopt it. If we're documenting the functions as
> processing the "N'th match", it seems to me to be natural to call
> the parameter "N" not "occurrence". Speaking of the "occurrence'th
> occurrence" is just silly, not to mention long and easy to misspell.
> Likewise, "position" is a horribly vague term for the search start
> position; it could be interpreted to mean several other things.
> "start" seems much better. "return_opt" is likewise awfully unclear.
> I went with "endoption" below, though I could be talked into something
> else. The only one of Oracle's choices that I like is "subexpr" for
> subexpression number ... but you went with DB2's rather vague "group"
> instead. I don't want to use their "capture group" terminology,
> because that appears nowhere else in our documentation. Our existing
> terminology is "parenthesized subexpression", which seems fine to me
> (and also agrees with Oracle's docs).
>
> * I spent a lot of time on the docs too. A lot of the syntax specs
> were wrong (where you put the brackets matters), many of the examples
> seemed confusingly overcomplicated, and the text explanations needed
> copy-editing.
>
> * Also, the regression tests seemed misguided. This patch is not
> responsible for testing the regexp engine as such; we have tests
> elsewhere that do that. So I don't think we need complex regexps
> here. We just need to verify that the parameters of these functions
> act properly, and check their error cases. That can be done much
> more quickly and straightforwardly than what you had.
>
>
> So here's a revised version that I like better. I think this
> is pretty nearly committable, aside from the question of whether
> a too-large subexpression number should be an error or not.

Thanks a lot for the patch improvement and the guidance. I have read the
patch and I agree with your choices I think I was too much trying to
mimic the oraclisms. I don't think we should take care of the too-large
subexpression number, the regexp writer should always test its regular
expression and also this will not prevent him to chose the wrong capture
group number but just a non existing one.

Best regards,

--
Gilles Darold

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2021-08-02 21:33:46 Re: Implementing Incremental View Maintenance
Previous Message Michail Nikolaev 2021-08-02 21:07:23 Re: Slow standby snapshot