Re: [POC] verifying UTF-8 using SIMD instructions

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] verifying UTF-8 using SIMD instructions
Date: 2021-03-12 15:36:51
Message-ID: CAFBsxsHA9fB=fwGbeONnoiJp050pSiaG_Wfuq0mpkw1X=ePBMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 12, 2021 at 9:14 AM Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
wrote:
>
> On my Arm64 VM :
>
> HEAD :
> mixed | ascii
> -------+-------
> 1091 | 628
> (1 row)
>
> PATCHED :
> mixed | ascii
> -------+-------
> 681 | 119

Thanks for testing! Good, the speedup is about as much as I can hope for
using plain C. In the next patch I'll go ahead and squash in the ascii fast
path, using 16-byte stride, unless there are objections. I claim we can
live with the regression Heikki found on an old 32-bit Arm platform since
it doesn't seem to be true of Arm in general.

> I guess, if at all we use the equivalent Arm NEON intrinsics, the
> "mixed" figures will be close to the "ascii" figures, going by your
> figures on x86.

I would assume so.

> I was not thinking about auto-vectorizing the code in
> pg_validate_utf8_sse42(). Rather, I was considering auto-vectorization
> inside the individual helper functions that you wrote, such as
> _mm_setr_epi8(), shift_right(), bitwise_and(), prev1(), splat(),

If the PhD holders who came up with this algorithm thought it possible to
do it that way, I'm sure they would have. In reality, simdjson has
different files for SSE4, AVX, AVX512, NEON, and Altivec. We can
incorporate any of those as needed. That's a PG15 project, though, and I'm
not volunteering.

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2021-03-12 16:41:44 Re: pg_amcheck contrib application
Previous Message David G. Johnston 2021-03-12 14:45:01 Re: documentation fix for SET ROLE