| From: | Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, adam(dot)warland(at)infor(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
| Subject: | Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation |
| Date: | 2025-12-04 17:14:51 |
| Message-ID: | 792a0aeb486e240ea34f10f895e0368fc434b8b7.camel@cybertec.at |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
On Wed, 2025-12-03 at 10:12 -0500, Tom Lane wrote:
> Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> writes:
> > On Tue, 2025-12-02 at 15:53 -0500, Tom Lane wrote:
> > > Looking at the code overall, I wonder if the outer loop doesn't have
> > > the same issue. The comments claim that we should be able to handle
> > > zero-length matches, but if the overall haystack is of length zero,
> > > we will fail to check for such a match.
>
>
> After further thought, it seems to me that this comment is an
> unjustified extrapolation from what Peter actually said, which was
> that the match substring could be physically shorter than the needle.
> Which is certainly true, for instance case-folding or accent-stripping
> might shorten the string. But it doesn't follow that a nonempty
> needle could ever match an empty substring; and that does not seem
> like it could be sane behavior to me. We're considering string
> comparison here, not regexes.
>
> We do require callers to eliminate the empty-needle case, and given
> that I think we could assume that match substrings must be at least
> 1 byte long. That assumption is what justifies the current API for
> these functions, and perhaps we can also simplify this loop by
> using it.
I think I get it. I don't see an explicit requirement for a non-empty
needle, but all callers of text_position_next_internal() handle that
case separately.
The attached v5 patch simplifies the loop to a do-while loop, assuming
that we cannot find a zero-length match.
I have also updated the comments to no longer mention the possibility
of an empty match, and for good measure I have added an Assert() that
the needle cannot be empty.
Yours,
Laurenz Albe
| Attachment | Content-Type | Size |
|---|---|---|
| v5-0001-Fix-greedy-substring-search-for-non-deterministic.patch | text/x-patch | 5.1 KB |
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Tom Lane | 2025-12-04 15:35:36 | Re: Segfault due to NULL ParamExecData value |