Quick Links

Re: Unicode grapheme clusters

From:	Greg Stark <stark(at)mit(dot)edu>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Unicode grapheme clusters
Date:	2023-01-20 00:37:48
Message-ID:	CAM-w4HMTeJ9nwd_9Ohvaka8qNQ8s0Xw=-URaCP5MCe2buDwHcw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

This is how we've always documented it. Postgres treats code points as
"characters" not graphemes.

You don't need to go to anything as esoteric as emojis to see this either.
Accented characters like é have no canonical forms that are multiple code
points and in some character sets some accented characters can only be
represented that way.

But I don't think there's any reason to consider changing e existing
functions. They have to be consistent with substr and the other string
manipulation functions.

We could add new functions to work with graphemes but it might bring more
pain keeping it up to date....

In response to

Unicode grapheme clusters at 2023-01-19 00:19:59 from Bruce Momjian

Responses

Re: Unicode grapheme clusters at 2023-01-20 00:47:49 from Bruce Momjian

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Rowley	2023-01-20 00:37:52	Re: [PATCH] Teach planner to further optimize sort in distinct
Previous Message	Peter Geoghegan	2023-01-20 00:17:00	Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation