From: | Jeff Davis <pgsql(at)j-davis(dot)com> |
---|---|
To: | Nico Williams <nico(at)cryptonector(dot)com> |
Cc: | "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: PostgreSQL 18 GA press release draft |
Date: | 2025-09-12 19:13:23 |
Message-ID: | 2c1f238473721d5a277ce047f40158536aa1d72d.camel@j-davis.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 2025-09-12 at 13:21 -0500, Nico Williams wrote:
> On Fri, Sep 12, 2025 at 10:11:59AM -0700, Jeff Davis wrote:
> > The name PG_UNICODE_FAST is meant to convey that it provides full
> > unicode semantics for case mapping and pattern matching, while also
> > being fast because it uses memcmp for comparisons. While the name
> > PG_C_UTF8 is meant to convey that it's closer to what users of the
> > libc
> > "C.UTF-8" locale might expect.
>
> How does one do form-insensitive comparisons?
If you mean case insensitive matching, you can do:
CASEFOLD(a) = CASEFOLD(b)
in any locale or provider, but it's best when using PG_UNICODE_FAST or
ICU, because it handles some nuances better. For instance:
CASEFOLD('ß') = CASEFOLD('SS') AND
CASEFOLD('ß') = CASEFOLD('ss') AND
CASEFOLD('ß') = CASEFOLD('ẞ')
are all true in PG_UNICODE_FAST and "en-US-x-icu", but not in libc
collations.
ICU also has case-insensitive collations, which offer a bit more
flexibility:
https://www.postgresql.org/docs/current/collation.html#COLLATION-NONDETERMINISTIC
Regards,
Jeff Davis
From | Date | Subject | |
---|---|---|---|
Next Message | Álvaro Herrera | 2025-09-12 19:24:40 | Re: PostgreSQL 18 GA press release draft |
Previous Message | Jonathan S. Katz | 2025-09-12 18:59:34 | Re: PostgreSQL 18 GA press release draft |