Quick Links

Re: Unicode grapheme clusters

From:	Greg Stark <stark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Unicode grapheme clusters
Date:	2023-01-24 16:40:01
Message-ID:	CAM-w4HNoonCZW3p=D9J2ev7LpOKXiAsgaH-XOUV=3gL_OJMwOA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sat, 21 Jan 2023 at 13:17, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Probably our long-term answer is to avoid depending on wcwidth
> and use wcswidth instead. But it's hard to get excited about
> doing the legwork for that until popular libc implementations
> get it right.

Here's an interesting blog post about trying to do this in Rust:

https://tomdebruijn.com/posts/rust-string-length-width-calculations/

TL;DR... Even counting the number of graphemes isn't enough because
terminals typically (but not always) display emoji graphemes using two
columns.

At the end of the day Unicode kind of assumes a variable-width display
where the rendering is handled by something that has access to the
actual font metrics. So anything trying to line things up in columns
in a way that works with any rendering system down the line using any
font is going to be making a best guess.

--
greg

In response to

Re: Unicode grapheme clusters at 2023-01-21 18:17:27 from Tom Lane

Responses

Re: Unicode grapheme clusters at 2023-01-24 16:47:32 from Isaac Morland
Re: Unicode grapheme clusters at 2023-01-24 19:20:32 from Bruce Momjian

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2023-01-24 16:43:25	Re: run pgindent on a regular basis / scripted manner
Previous Message	Jelte Fennema	2023-01-24 16:03:25	Re: run pgindent on a regular basis / scripted manner