Re: badly calculated width of emoji in psql

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Jacob Champion <pchampion(at)vmware(dot)com>
Cc: "horikyota(dot)ntt(at)gmail(dot)com" <horikyota(dot)ntt(at)gmail(dot)com>, "laurenz(dot)albe(at)cybertec(dot)at" <laurenz(dot)albe(at)cybertec(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: badly calculated width of emoji in psql
Date: 2021-07-07 18:19:34
Message-ID: CAFj8pRCL=yYai6io8+xn7JYx5L3Qd+yWmQnJbdyzQx19QLhk=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

st 7. 7. 2021 v 20:03 odesílatel Jacob Champion <pchampion(at)vmware(dot)com>
napsal:

> On Mon, 2021-04-05 at 14:07 +0900, Kyotaro Horiguchi wrote:
> > At Fri, 2 Apr 2021 11:51:26 +0200, Pavel Stehule <
> pavel(dot)stehule(at)gmail(dot)com> wrote in
> > > with this patch, the formatting is correct
> >
> > I think the hardest point of this issue is that we don't have a
> > reasonable authoritative source that determines character width. And
> > that the presentation is heavily dependent on environment.
>
> > Unicode 9 and/or 10 defines the character properties "Emoji" and
> > "Emoji_Presentation", and tr51[1] says that
> >
> > > Emoji are generally presented with a square aspect ratio, which
> > > presents a problem for flags.
> > ...
> > > Current practice is for emoji to have a square aspect ratio, deriving
> > > from their origin in Japanese. For interoperability, it is recommended
> > > that this practice be continued with current and future emoji. They
> > > will typically have about the same vertical placement and advance
> > > width as CJK ideographs. For example:
> >
> > Ok, even putting aside flags, the first table in [2] asserts that "#",
> > "*", "0-9" are emoji characters. But we and I think no-one never
> > present them in two-columns. And the table has many mysterious holes
> > I haven't looked into.
>
> I think that's why Emoji_Presentation is false for those characters --
> they _could_ be presented as emoji if the UI should choose to do so, or
> if an emoji presentation selector is used, but by default a text
> presentation would be expected.
>
> > We could Emoji_Presentation=yes for the purpose, but for example,
> > U+23E9(BLACK RIGHT-POINTING DOUBLE TRIANGLE) has the property
> > Emoji_Presentation=yes but U+23E9(BLACK RIGHT-POINTING DOUBLE TRIANGLE
> > WITH VERTICAL BAR) does not for a reason uncertaion to me. It doesn't
> > look like other than some kind of mistake.
>
> That is strange.
>
> > About environment, for example, U+23E9 is an emoji, and
> > Emoji_Presentation=yes, but it is shown in one column on my
> > xterm. (I'm not sure what font am I using..)
>
> I would guess that's the key issue here. If we choose a particular
> width for emoji characters, is there anything keeping a terminal's font
> from doing something different anyway?
>
> Furthermore, if the stream contains an emoji presentation selector
> after a code point that would normally be text, shouldn't we change
> that glyph to have an emoji "expected width"?
>
> I'm wondering if the most correct solution would be to have the user
> tell the client what width to use, using .psqlrc or something.
>

Gnome terminal does it - VTE does it - there is option how to display chars
with not well specified width.

> > A possible compromise is that we treat all Emoji=yes characters
> > excluding ASCII characters as double-width and manually merge the
> > fragmented regions into reasonably larger chunks.
>
> We could also keep the fragments as-is and generate a full interval
> table, like common/unicode_combining_table.h. It looks like there's
> roughly double the number of emoji intervals as combining intervals, so
> hopefully adding a second binary search wouldn't be noticeably slower.
>
> --
>
> In your opinion, would the current one-line patch proposal make things
> strictly better than they are today, or would it have mixed results?
> I'm wondering how to help this patch move forward for the current
> commitfest, or if we should maybe return with feedback for now.
>

We can check how these chars are printed in most common terminals in modern
versions. I am afraid that it can be problematic to find a solution that
works everywhere, because some libraries on some platforms are pretty
obsolete.

Regards

Pavel

> --Jacob
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Gilles Darold 2021-07-07 18:28:34 Re: Case expression pushdown
Previous Message David Christensen 2021-07-07 18:16:36 Re: DELETE CASCADE