Re: badly calculated width of emoji in psql

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Jacob Champion <pchampion(at)vmware(dot)com>, "horikyota(dot)ntt(at)gmail(dot)com" <horikyota(dot)ntt(at)gmail(dot)com>, "laurenz(dot)albe(at)cybertec(dot)at" <laurenz(dot)albe(at)cybertec(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: badly calculated width of emoji in psql
Date: 2021-07-19 10:03:35
Message-ID: CAFj8pRCGkhApxBhtBP1abW9Wj+HtDaUuA63WudZb2oH8p445NQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

po 19. 7. 2021 v 9:46 odesílatel Michael Paquier <michael(at)paquier(dot)xyz>
napsal:

> On Wed, Jul 07, 2021 at 06:03:34PM +0000, Jacob Champion wrote:
> > I would guess that's the key issue here. If we choose a particular
> > width for emoji characters, is there anything keeping a terminal's font
> > from doing something different anyway?
>
> I'd say that we are doing our best in guessing what it should be,
> then. One cannot predict how fonts are designed.
>
> > We could also keep the fragments as-is and generate a full interval
> > table, like common/unicode_combining_table.h. It looks like there's
> > roughly double the number of emoji intervals as combining intervals, so
> > hopefully adding a second binary search wouldn't be noticeably slower.
>
> Hmm. Such things have a cost, and this one sounds costly with a
> limited impact. What do we gain except a better visibility with psql?
>

The benefit is correct displaying. I checked impact on server side, and
ucs_wcwidth is used just for calculation of error position. Any other usage
is only in psql.

Moreover, I checked unicode ranges, and I think so for common languages the
performance impact should be zero (because typically use ucs < 0x1100). The
possible (but very low) impact can be for some historic languages or
special symbols. It has not any impact for ranges that currently return
display width 2, because the new range is at the end of list.

I am not sure how wide usage of PQdsplen is outside psql, but I have no
reason to think so, so developers will prefer this function over built
functionality in any developing environment that supports unicode. So in
this case I have a strong opinion to prefer correctness of result against
current speed (note: I have an experience from pspg development, where this
operation is really on critical path, and I tried do some micro
optimization without strong effect - on very big unusual result (very wide,
very long (100K rows) the difference was about 500 ms (on pager side, it
does nothing else than string operations in this moment)).

Regards

Pavel

>
> > In your opinion, would the current one-line patch proposal make things
> > strictly better than they are today, or would it have mixed results?
> > I'm wondering how to help this patch move forward for the current
> > commitfest, or if we should maybe return with feedback for now.
>
> Based on the following list, it seems to me that [u+1f300,u+0x1faff]
> won't capture everything, like the country flags:
> http://www.unicode.org/emoji/charts/full-emoji-list.html
> --
> Michael
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ibrar Ahmed 2021-07-19 10:13:55 Re: Minimal logical decoding on standbys
Previous Message Yugo NAGATA 2021-07-19 09:52:46 Re: corruption of WAL page header is never reported