Re: badly calculated width of emoji in psql

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: badly calculated width of emoji in psql
Date: 2021-04-05 13:13:28
Message-ID: CAFj8pRC74VjsR9s3wuh0mrT+FAmLNvvxM7WObaoOFEiQdQTeog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

po 5. 4. 2021 v 7:07 odesílatel Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
napsal:

> At Fri, 2 Apr 2021 11:51:26 +0200, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote in
> > with this patch, the formatting is correct
>
> I think the hardest point of this issue is that we don't have a
> reasonable authoritative source that determines character width. And
> that the presentation is heavily dependent on environment.
>
> Unicode 9 and/or 10 defines the character properties "Emoji" and
> "Emoji_Presentation", and tr51[1] says that
>
> > Emoji are generally presented with a square aspect ratio, which
> > presents a problem for flags.
> ...
> > Current practice is for emoji to have a square aspect ratio, deriving
> > from their origin in Japanese. For interoperability, it is recommended
> > that this practice be continued with current and future emoji. They
> > will typically have about the same vertical placement and advance
> > width as CJK ideographs. For example:
>
> Ok, even putting aside flags, the first table in [2] asserts that "#",
> "*", "0-9" are emoji characters. But we and I think no-one never
> present them in two-columns. And the table has many mysterious holes
> I haven't looked into.
>
> We could Emoji_Presentation=yes for the purpose, but for example,
> U+23E9(BLACK RIGHT-POINTING DOUBLE TRIANGLE) has the property
> Emoji_Presentation=yes but U+23E9(BLACK RIGHT-POINTING DOUBLE TRIANGLE
> WITH VERTICAL BAR) does not for a reason uncertaion to me. It doesn't
> look like other than some kind of mistake.
>
> About environment, for example, U+23E9 is an emoji, and
> Emoji_Presentation=yes, but it is shown in one column on my
> xterm. (I'm not sure what font am I using..)
>
> [1] http://www.unicode.org/reports/tr51/
> [2] https://unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt
>
> A possible compromise is that we treat all Emoji=yes characters
> excluding ASCII characters as double-width and manually merge the
> fragmented regions into reasonably larger chunks.
>

ok

It should be fixed in glibc,

https://sourceware.org/bugzilla/show_bug.cgi?id=20313

so we can check it

Regards

Pavel

>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2021-04-05 13:15:32 Re: ALTER TABLE ADD COLUMN fast default
Previous Message Euler Taveira 2021-04-05 13:11:04 Re: Logical Replication - improve error message while adding tables to the publication in check_publication_add_relation