| From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
|---|---|
| To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Cc: | Peter Eisentraut <peter(at)eisentraut(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Remaining dependency on setlocale() |
| Date: | 2025-11-29 20:50:56 |
| Message-ID: | 5f65b85740197ba6249ea507cddf609f84a6188b.camel@j-davis.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, 2025-11-26 at 09:50 +0800, Chao Li wrote:
> * The retry logic implies that a single-byte char may become multiple
> bytes after folding, otherwise retry is not needed because you have
> allocated s+1 bytes for dest buffers. From this perspective, we
> should use two needed variables: neededA and neededB, if neededA !=
> neededB, then the two strings are different; if neededA == neededB,
> then we should be perform strncmp, but here we should pass neededA
> (or neededB as they are identical) to strncmp(al, bl, neededA).
Thank you.
It's actually worse than that: having a single 's' argument is just
completely wrong. Consider:
a: U&'x\0394\0394\0394'
b: U&'\0394\0394\0394'
There is no value for byte length 's' for which both 'a' and 'b' are
properly-encoded strings. So, the current code passes invalid byte
sequences to LOWER(), which is another pre-existing bug.
ltree_strncasecmp() is only used for checking equality of the first s
bytes of the query, so let's make it a safer API that just checks
prefix equality. Attached.
> * Based on the logic you implemented in 0004, first pg_strfold() has
> copied as many chars as possible to dest buffer, so when retry,
> ideally we can should resume instead of start over. However, if
> single-byte->multi-byte folding happens, we have no information to
> decide from where to resume.
Right.
That suggests that we might want some kind of lazy or iterator-based
API for string folding. We'd need to find the right way to do that with
ICU. If we find that it's a performance problem somewhere, we can look
into that. Do you think we need that now?
> From this perspective, in 0004, do we really need to take the try-
> the-best strategy for strlower_c()? If there are some other use cases
> that require data to be placed in dest buffer even if dest buffer
> doesn’t have enough space, then my patch [1] of changing
> strlower_libc_sb() should be considered.
I will look into that.
> SB_lower_char should be changed to C_IMatchText.
Updated comment.
> I think the comment should be updated accordingly, like “for ILIKE in
> the C locale”.
Done, thank you.
> * match is allocated with pattlen+1 bytes, is it long enough to hold
> pattlen multiple-byte chars?
>
> * match is not freed, but looks like it should be:
...
> Should “pos” be “part[pos]” assigning to match[match_pos++]?
All fixed, thank you! (I apologize for posting a patch in that state to
begin with...)
I also reorganized slightly to separate out the pg_iswcased() API into
its own patch, and moved the like_support.c changes from the ctype_is_c
patch (already committed: 1476028225) into the pattern prefixes patch.
Regards,
Jeff Davis
| Attachment | Content-Type | Size |
|---|---|---|
| v11-0001-Change-some-callers-to-use-pg_ascii_toupper.patch | text/x-patch | 1.5 KB |
| v11-0002-Make-regex-max_chr-depend-on-encoding-not-provid.patch | text/x-patch | 3.2 KB |
| v11-0003-Fix-inconsistency-between-ltree_strncasecmp-and-.patch | text/x-patch | 7.8 KB |
| v11-0004-Remove-char_tolower-API.patch | text/x-patch | 9.3 KB |
| v11-0005-Add-pg_iswcased.patch | text/x-patch | 5.7 KB |
| v11-0006-Use-multibyte-aware-extraction-of-pattern-prefix.patch | text/x-patch | 12.1 KB |
| v11-0007-fuzzystrmatch-use-pg_ascii_toupper.patch | text/x-patch | 3.2 KB |
| v11-0008-downcase_identifier-use-method-table-from-locale.patch | text/x-patch | 11.5 KB |
| v11-0009-Control-LC_COLLATE-with-GUC.patch | text/x-patch | 7.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jeff Davis | 2025-11-29 21:04:00 | Re: Remaining dependency on setlocale() |
| Previous Message | Marcos Pegoraro | 2025-11-29 19:39:18 | Re: [PoC] XMLCast (SQL/XML X025) |