Re: badly calculated width of emoji in psql

From: Jacob Champion <pchampion(at)vmware(dot)com>
To: "pavel(dot)stehule(at)gmail(dot)com" <pavel(dot)stehule(at)gmail(dot)com>
Cc: "laurenz(dot)albe(at)cybertec(dot)at" <laurenz(dot)albe(at)cybertec(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "michael(at)paquier(dot)xyz" <michael(at)paquier(dot)xyz>, "horikyota(dot)ntt(at)gmail(dot)com" <horikyota(dot)ntt(at)gmail(dot)com>
Subject: Re: badly calculated width of emoji in psql
Date: 2021-07-26 17:27:24
Message-ID: 766ee8c3a11aadc724bec460d1be57079b3dc33c.camel@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2021-07-23 at 17:42 +0200, Pavel Stehule wrote:
> čt 22. 7. 2021 v 0:12 odesílatel Jacob Champion <pchampion(at)vmware(dot)com> napsal:
> >
> > Pavel, I'd be interested to see what your benchmarks find with this
> > code. Does this fix the original issue for you?
>
> I can confirm that the original issue is fixed.

Great!

> I tested performance
>
> I had three data sets
>
> 1. typical data - mix ascii and utf characters typical for czech
> language - 25K lines - there is very small slowdown 2ms from 24 to
> 26ms (stored file of this result has 3MB)
>
> 2. the worst case - this reports has only emoji 1000 chars * 10K rows
> - there is more significant slowdown - from 160 ms to 220 ms (stored
> file has 39MB)

I assume the stored file size has grown with this patch, since we're
now printing the correct number of spacing/border characters?

> 3. a little bit of obscure datasets generated by \x and select * from
> pg_proc - it has 99K lines - and there are a lot of unicode
> decorations (borders). The line has 17K chars. (the stored file has
> 1.7GB)
> In this dataset I see a slowdown from 4300 to 4700 ms.

These results are lining up fairly well with my profiling. To isolate
the effects of the new algorithm (as opposed to printing time) I
redirected to /dev/null:

psql postgres -c "select repeat(unistr('\u115D'), 10000000);" > /dev/null

This is what I expect to be the worst case for the new patch: a huge
string consisting of nothing but \u115D, which is in the first interval
and therefore takes the maximum number of iterations during the binary
search.

For that command, the wall time slowdown with this patch was about
240ms (from 1.128s to 1.366s, or 21% slower). Callgrind shows an
increase of 18% in the number of instructions executed with the
interval table patch, all of it coming from PQdsplen (no surprise).
PQdsplen itself has a 36% increase in instruction count for that run.

I also did a microbenchmark of PQdsplen (code attached, requires Google
Benchmark [1]). The three cases I tested were standard ASCII
characters, a smiley-face emoji, and the worst-case \u115F character.

Without the patch:

------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------
...
BM_Ascii_mean 4.97 ns 4.97 ns 5
BM_Ascii_median 4.97 ns 4.97 ns 5
BM_Ascii_stddev 0.035 ns 0.035 ns 5
...
BM_Emoji_mean 6.30 ns 6.30 ns 5
BM_Emoji_median 6.30 ns 6.30 ns 5
BM_Emoji_stddev 0.045 ns 0.045 ns 5
...
BM_Worst_mean 12.4 ns 12.4 ns 5
BM_Worst_median 12.4 ns 12.4 ns 5
BM_Worst_stddev 0.038 ns 0.038 ns 5

With the patch:

------------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------------
...
BM_Ascii_mean 4.59 ns 4.59 ns 5
BM_Ascii_median 4.60 ns 4.60 ns 5
BM_Ascii_stddev 0.069 ns 0.068 ns 5
...
BM_Emoji_mean 11.8 ns 11.8 ns 5
BM_Emoji_median 11.8 ns 11.8 ns 5
BM_Emoji_stddev 0.059 ns 0.059 ns 5
...
BM_Worst_mean 18.5 ns 18.5 ns 5
BM_Worst_median 18.5 ns 18.5 ns 5
BM_Worst_stddev 0.077 ns 0.077 ns 5

So an incredibly tiny improvement in the ASCII case, which is
reproducible across multiple runs and not just a fluke (I assume
because the code is smaller now and has better cache line
characteristics?). A ~90% slowdown for the emoji case, and a ~50%
slowdown for the worst-performing characters. That seems perfectly
reasonable considering we're talking about dozens of nanoseconds.

> 9% looks too high, but in absolute time it is 400ms for 99K lines and
> very untypical data, or 2ms for more typical results., 2ms are
> nothing (for interactive work). More - this is from a pspg
> perspective. In psql there can be overhead of network, protocol
> processing, formatting, and more and more, and psql doesn't need to
> calculate display width of decorations (borders), what is the reason
> for slowdowns in pspg.

Yeah. Considering the alignment code is for user display, the absolute
performance is going to dominate, and I don't see any red flags so far.
If you're regularly dealing with unbelievably huge amounts of emoji, I
think the amount of extra time we're seeing here is unlikely to be a
problem. If it is, you can always turn alignment off. (Do you rely on
horizontal alignment for lines with millions of characters, anyway?)

Laurenz, Michael, what do you think?

--Jacob

[1] https://github.com/google/benchmark

Attachment Content-Type Size
main.cpp text/x-c++src 705 bytes
Makefile text/x-makefile 428 bytes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-07-26 17:30:42 Re: Skip temporary table schema name from explain-verbose output.
Previous Message Robert Haas 2021-07-26 17:15:22 Re: Skip temporary table schema name from explain-verbose output.