Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gilles Darold <gilles(at)darold(dot)net>
Cc: Gilles Darold <gillesdarold(at)gmail(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, er(at)xs4all(dot)nl, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace
Date: 2021-07-30 21:38:24
Message-ID: 1124806.1627681104@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Gilles Darold <gilles(at)darold(dot)net> writes:
> Le 26/07/2021 à 21:56, Tom Lane a écrit :
>> I'm inclined to just drop the regexp_replace additions. I don't think
>> that the extra parameters Oracle provides here are especially useful.
>> They're definitely not useful enough to justify creating compatibility
>> hazards for.

> I would not say that being able to replace the Nth occurrence of a
> pattern matching is not useful but i agree that this is not a common
> case with replacement. Both Oracle [1] and IBM DB2 [2] propose this form
> and I have though that we can not have compatibility issues because of
> the different data type at the 4th parameter.

Well, here's an example of the potential issues:

regression=# create function rr(text,text,text,text) returns text
regression-# language sql as $$select 'text'$$;
CREATE FUNCTION
regression=# create function rr(text,text,text,int4) returns text
language sql as $$select 'int4'$$;
CREATE FUNCTION
regression=# select rr('a','b','c','d');
rr
------
text
(1 row)

regression=# select rr('a','b','c',42);
rr
------
int4
(1 row)

So far so good, but:

regression=# prepare rr as select rr('a','b','c',$1);
PREPARE
regression=# execute rr(12);
rr
------
text
(1 row)

So somebody trying to use the 4-parameter Oracle form from, say, JDBC
would get bit if they were sloppy about specifying parameter types.

The one saving grace is that digits aren't valid regexp flags,
so the outcome would be something like

regression=# select regexp_replace('a','b','c','12');
ERROR: invalid regular expression option: "1"

which'd be less difficult to debug than silent misbehavior.
Conversely, if you thought you were passing flags but it somehow
got interpreted as a start position, that would fail too:

regression=# prepare rri as select rr('a','b','c', $1::int);
PREPARE
regression=# execute rri('gi');
ERROR: invalid input syntax for type integer: "gi"
LINE 1: execute rri('gi');
^

Still, I bet a lot that we'd see periodic bug reports complaining
that it doesn't work.

> Anyway, maybe we can just
> rename the function even if I would prefer that regexp_replace() be
> extended. For example:
>     regexp_replace(source, pattern, replacement [, flags ]);
>     regexp_substitute(source, pattern, replacement [, position ] [,
> occurrence ] [, flags ]);

Hmm. Of course the entire selling point of this patch seems to be
bug-compatibility with Oracle, so using different names is largely
defeating the point :-(

Maybe we should just hold our noses and do it. The point that
you'd get a recognizable failure if the wrong function were chosen
reassures me a little bit. We've seen a lot of cases where this
sort of ambiguity results in the system just silently doing something
different from what you expected, and I was afraid that that could
happen here.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-07-30 21:49:09 Re: [PATCH] Hooks at XactCommand level
Previous Message Heikki Linnakangas 2021-07-30 21:33:34 Re: Split xlog.c