From: | Greg Stark <stark(at)mit(dot)edu> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Unicode grapheme clusters |
Date: | 2023-01-24 16:40:01 |
Message-ID: | CAM-w4HNoonCZW3p=D9J2ev7LpOKXiAsgaH-XOUV=3gL_OJMwOA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, 21 Jan 2023 at 13:17, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Probably our long-term answer is to avoid depending on wcwidth
> and use wcswidth instead. But it's hard to get excited about
> doing the legwork for that until popular libc implementations
> get it right.
Here's an interesting blog post about trying to do this in Rust:
https://tomdebruijn.com/posts/rust-string-length-width-calculations/
TL;DR... Even counting the number of graphemes isn't enough because
terminals typically (but not always) display emoji graphemes using two
columns.
At the end of the day Unicode kind of assumes a variable-width display
where the rendering is handled by something that has access to the
actual font metrics. So anything trying to line things up in columns
in a way that works with any rendering system down the line using any
font is going to be making a best guess.
--
greg
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-01-24 16:43:25 | Re: run pgindent on a regular basis / scripted manner |
Previous Message | Jelte Fennema | 2023-01-24 16:03:25 | Re: run pgindent on a regular basis / scripted manner |