Re: [POC] verifying UTF-8 using SIMD instructions

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] verifying UTF-8 using SIMD instructions
Date: 2021-03-12 13:14:13
Message-ID: CAJ3gD9emykVjag37aES85mNfbq_QOg+i3VwhMTLg-qiAE7Ai1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 9 Mar 2021 at 17:14, John Naylor <john(dot)naylor(at)enterprisedb(dot)com> wrote:
> On Tue, Mar 9, 2021 at 5:00 AM Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> > Just a quick question before I move on to review the patch ... The
> > improvement looks like it is only meant for x86 platforms.
>
> Actually it's meant to be faster for all platforms, since the C fallback is quite a bit different from HEAD. I've found it to be faster on ppc64le. An earlier version of the patch was a loser on 32-bit Arm because of alignment issues, but if you could run the test script attached to [1] on 64-bit Arm, I'd be curious to see how it does on 0002, and whether 0003 and 0004 make things better or worse. If there is trouble building on non-x86 platforms, I'd want to fix that also.

On my Arm64 VM :

HEAD :
mixed | ascii
-------+-------
1091 | 628
(1 row)

PATCHED :
mixed | ascii
-------+-------
681 | 119

So the fallback function does show improvements on Arm64.

I guess, if at all we use the equivalent Arm NEON intrinsics, the
"mixed" figures will be close to the "ascii" figures, going by your
figures on x86.

> > Can this be
> > done in a portable way by arranging for auto-vectorization ? Something
> > like commit 88709176236caf. This way it would benefit other platforms
> > as well.
>
> I'm fairly certain that the author of a compiler capable of doing that in this case would be eligible for some kind of AI prize. :-)

:)

I was not thinking about auto-vectorizing the code in
pg_validate_utf8_sse42(). Rather, I was considering auto-vectorization
inside the individual helper functions that you wrote, such as
_mm_setr_epi8(), shift_right(), bitwise_and(), prev1(), splat(),
saturating_sub() etc. I myself am not sure whether it is feasible to
write code that auto-vectorizes all these function definitions.
saturating_sub() seems hard, but I could see the gcc docs mentioning
support for generating such instructions for a particular code loop.
But for the index lookup function() it seems impossible to generate
the needed index lookup intrinsics. We can have platform-specific
function definitions for such exceptional cases.

I am considering this only because that would make the exact code work
on other platforms like arm64 and ppc, and won't have to have
platform-specific files. But I understand that it is easier said than
done. We will have to process the loop in pg_validate_utf8_sse42() in
128-bit chunks, and pass each chunk to individual functions, which
could mean extra work and extra copy in extracting the chunk data and
passing it around, which may make things drastically slow. You are
passing around the chunks using __m128i type, so perhaps it means
passing around just a reference to the simd registers. Not sure.

--
Thanks,
-Amit Khandekar
Huawei Technologies

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-03-12 13:33:14 Re: pg_amcheck contrib application
Previous Message Amit Langote 2021-03-12 12:47:18 Re: posgres 12 bug (partitioned table)