Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Mark Dilger" <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: "Postgres hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Andreas Karlsson" <andreas(at)proxel(dot)se>, "David Fetter" <david(at)fetter(dot)org>
Subject: Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]
Date: 2021-03-02 05:05:29
Message-ID: 244c5e77-3ca2-4d28-9d32-4ac53f33c68d@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 2, 2021, at 01:12, Mark Dilger wrote:
> I like the idea so I did a bit of testing. I think the following should not error, but does:
>
> +SELECT regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i');
> +ERROR: range lower bound must be less than or equal to range upper bound

Thanks for testing!

The bug is due to using an (inclusive,inclusive) range,
so for the example the code tried to construct a (7,6,'[]') range.

When trying to fix, I think I've found a general problem with ranges:

I'll use int4range() to demonstrate the problem:

First the expected error for what the patch tries to do internally using make_range().
This is all good:

# SELECT int4range(7,6,'[]');
ERROR: range lower bound must be less than or equal to range upper bound

I tried to fix this like this:

@ src/backend/utils/adt/regexp.c
- lower.val = Int32GetDatum(so + 1);
+ lower.val = Int32GetDatum(so);
lower.infinite = false;
- lower.inclusive = true;
+ lower.inclusive = false;
lower.lower = true;

Which would give the same result as doing:

SELECT int4range(6,6,'(]');
int4range
-----------
empty
(1 row)

Hmm. This "empty" value what surprise to me.
I would instead have assumed the canonical form "[7,7)".

If I wanted to know if the range is empty or not,
I would have guessed I should use the isempty() function, like this:

SELECT isempty(int4range(6,6,'(]'));
isempty
---------
t
(1 row)

Since we have this isempty() function, I don't see the point of discarding
the lower/upper vals, since they contain possibly interesting information
on where the empty range exists.

I find it strange two ranges of zero-length with different bounds are considered equal:

SELECT '[7,7)'::int4range = '[8,8)'::int4range;
?column?
----------
t
(1 row)

This seems like a bug to me. What am I missing here?

Unless fixed, then the way I see it, I don't think we can use int4range[] for regexp_positions(),
if we want to allow returning the positions for zero-length matches, which would be nice.

/Joel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ajin Cherian 2021-03-02 05:08:34 Re: repeated decoding of prepared transactions
Previous Message Peter Geoghegan 2021-03-02 04:42:34 Re: 64-bit XIDs in deleted nbtree pages