Re: [POC] verifying UTF-8 using SIMD instructions

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] verifying UTF-8 using SIMD instructions
Date: 2021-02-19 00:43:04
Message-ID: CAFBsxsFyDfp2d6=9gvPZSEmCDQyTLeZvkbqQTSvGGT3X+Fa0GQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 15, 2021 at 9:32 PM John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
wrote:
>
> On Mon, Feb 15, 2021 at 9:18 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
wrote:
> >
> > I'm guessing that's because the unaligned access in check_ascii() is
> > expensive on this platform.

> Some possible remedies:

> 3) #ifdef out the ascii check for 32-bit platforms.

> 4) Same as the non-UTF8 case -- only check for ascii 8 bytes at a time.
I'll probably try this first.

I've attached a couple patches to try on top of v4; maybe they'll help the
Arm32 regression. 01 reduces the stride to 8 bytes, and 02 applies on top
of v1 to disable the fallback fast path entirely on 32-bit platforms. A bit
of a heavy hammer, but it'll confirm (or not) your theory about unaligned
loads.

Also, I've included patches to explain more fully how I modeled non-UTF-8
performance while still using the UTF-8 tests. I think it was a useful
thing to do, and I have a theory that might predict how a non-UTF8 encoding
will perform with the fast path.

03A and 03B are independent of each other and conflict, but both apply on
top of v4 (don't need 02). Both replace the v4 fallback with the ascii
fastpath + pg_utf8_verifychar() in the loop, similar to utf-8 on master.
03A has a local static copy of pg_utf8_islegal(), and 03B uses the existing
global function. (On x86, you can disable SSE4 by passing
USE_FALLBACK_UTF8=1 to configure.)

While Clang 10 regressed for me on pure multibyte in a similar test
upthread, on Linux gcc 8.4 there isn't a regression at all. IIRC, gcc
wasn't as good as Clang when the API changed a few weeks ago, so its
regression from v4 is still faster than master. Clang only regressed with
my changes because it somehow handled master much better to begin with.

x86-64 Linux gcc 8.4

master

chinese | mixed | ascii
---------+-------+-------
1453 | 857 | 428

v4 (fallback verifier written as a single function)

chinese | mixed | ascii
---------+-------+-------
815 | 514 | 82

v4 plus addendum 03A -- emulate non-utf-8 using a copy of
pg_utf8_is_legal() as a static function

chinese | mixed | ascii
---------+-------+-------
1115 | 547 | 87

v4 plus addendum 03B -- emulate non-utf-8 using pg_utf8_is_legal() as a
global function

chinese | mixed | ascii
---------+-------+-------
1279 | 604 | 82

(I also tried the same on ppc64le Linux, gcc 4.8.5 and while not great, it
never got worse than master either on pure multibyte.)

This is supposed to model the performance of a non-utf8 encoding, where we
don't have a bespoke function written from scratch. Here's my theory: If an
encoding has pg_*_mblen(), a global function, inside pg_*_verifychar(), it
seems it won't benefit as much from an ascii fast path as one whose
pg_*_verifychar() has no function calls. I'm not sure whether a compiler
can inline a global function's body into call sites in the unit where it's
defined. (I haven't looked at the assembly.) But recall that you didn't
commit 0002 from the earlier encoding change, because it wasn't performing.
I looked at that patch again, and while it inlined the pg_utf8_verifychar()
call, it still called the global function pg_utf8_islegal().

If the above is anything to go by, on gcc at least, I don't think we need
to worry about a regression when adding an ascii fast path to non-utf-8
multibyte encodings.

Regarding SSE, I've added an ascii fast path in my local branch, but it's
not going to be as big a difference because 1) the check is more expensive
in terms of branches than the C case, and 2) because the general case is so
fast already, it's hard to improve upon. I just need to do some testing and
cleanup on the whole thing, and that'll be ready to share.

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
addendum-01-8-byte-stride.patch application/x-patch 1.0 KB
addendum-02-remove-ascii-fast-path-32-bit.patch application/x-patch 526 bytes
addendum-03A-emulate-non-utf8-multibyte-STATIC.patch application/x-patch 3.6 KB
addendum-03B-emulate-non-utf8-multibyte-GLOBAL.patch application/x-patch 2.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2021-02-19 01:18:30 Re: documentation fix for SET ROLE
Previous Message Bossart, Nathan 2021-02-19 00:08:04 Re: archive status ".ready" files may be created too early