Re: SIMILAR TO expressions translate wildcards where they shouldn't

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: SIMILAR TO expressions translate wildcards where they shouldn't
Date: 2025-05-27 05:57:45
Message-ID: aDVUWaoZg8cW3B9x@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, May 23, 2025 at 11:42:10AM +0200, Laurenz Albe wrote:
> I also improved the code by adding more comments.
> I renamed "incharclass" to "charclass_depth", which is more descriptive.
>
> Also, I had to work some more on handling carets:
> While the closing bracket is a regular character in []] and [^]], it
> is not in expressions like [^^].

The transformation with the patch:
'^(?:\.[.[:alnum:]_].[]%].*[^]$]\$[^^]\^[(](?:p))$'
And on HEAD:
'^(?:\.[.[:alnum:].].[].*].*[^]\$]\$[^^]\^[(](?:p))$'

If I am not missing something, '%', '_', '$', single and double carets
at the beginning are covered. '.' and '(' are not.

Anyway, that's really hard to parse so I would suggest to split each
check into queries of their own to show individual conversions in
these EXPLAIN outputs (we don't care if tese regexps are correct, just
want to check the output to the POSIX style). I am OK with the point
based on charclass_start to count the number of carets at the
beginning of a character class.

With some tweaks and the tests reworked, I am finishing with the
reviewed version attached. What do you think?
--
Michael

Attachment Content-Type Size
v3-0001-Fix-SIMILAR-TO-regex-translation-for-character-cl.patch text/x-diff 8.8 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2025-05-27 10:11:58 BUG #18937: New error class 10 for XQuery errors lacks a "standard" error code
Previous Message Tom Lane 2025-05-26 15:07:18 Re: BUG #18934: Even with WITH ADMIN OPTION, I Cannot Manage Role Memberships