Re: SIMILAR TO expressions translate wildcards where they shouldn't

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: SIMILAR TO expressions translate wildcards where they shouldn't
Date: 2025-05-23 09:42:10
Message-ID: 5498224d1d5a6a3be7f29ae7725f76162d48372f.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, 2025-05-23 at 12:22 +0900, Michael Paquier wrote:
> Anyway, I don't think that the tests in the patch are complete. For
> example, '[[^](]' is transformed into '^(?:[[^](])$' in the SIMILAR TO
> conversion with the patch, and before the patch we get
> '^(?:[[^](?:])$'. Note the translation of the last parenthesis '(' to
> "(?:" when inside the character class, but your patch does not
> document that. AFAIU, we should not convert the parenthesis '(' while
> in a multi-level character class as the patch does, but we have no
> tests for it and the patch does not document this part, either.
>
> Could it be possible to split the single SIMILAR TO expression into
> multiple smaller pieces for each character that should not be
> converted while inside a character class? This is hard to parse as
> written in your proposal of patch.

I have changed the regression test like you suggest.

I also improved the code by adding more comments.
I renamed "incharclass" to "charclass_depth", which is more descriptive.

Also, I had to work some more on handling carets:
While the closing bracket is a regular character in []] and [^]], it
is not in expressions like [^^].

Yours,
Laurenz Albe

Attachment Content-Type Size
v2-0001-Fix-SIMILAR-TO-regex-translation-for-character-cl.patch text/x-patch 5.0 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Daniel Gustafsson 2025-05-23 10:38:21 Re: Typo in the Timezone
Previous Message Joseph Rana 2025-05-23 09:40:59 Typo in the Timezone