From: | Greg Stark <stark(at)mit(dot)edu> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Unicode grapheme clusters |
Date: | 2023-01-20 00:37:48 |
Message-ID: | CAM-w4HMTeJ9nwd_9Ohvaka8qNQ8s0Xw=-URaCP5MCe2buDwHcw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
This is how we've always documented it. Postgres treats code points as
"characters" not graphemes.
You don't need to go to anything as esoteric as emojis to see this either.
Accented characters like é have no canonical forms that are multiple code
points and in some character sets some accented characters can only be
represented that way.
But I don't think there's any reason to consider changing e existing
functions. They have to be consistent with substr and the other string
manipulation functions.
We could add new functions to work with graphemes but it might bring more
pain keeping it up to date....
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2023-01-20 00:37:52 | Re: [PATCH] Teach planner to further optimize sort in distinct |
Previous Message | Peter Geoghegan | 2023-01-20 00:17:00 | Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation |