Re: [POC] verifying UTF-8 using SIMD instructions

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] verifying UTF-8 using SIMD instructions
Date: 2021-07-21 15:29:21
Message-ID: CA+hUKGKbH3TSE9LiXqsOyYjvqBo838e=9PM2BR4cuPajYfCvMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 13, 2021 at 4:37 AM John Naylor
<john(dot)naylor(at)enterprisedb(dot)com> wrote:
> On Fri, Mar 12, 2021 at 9:14 AM Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> > I was not thinking about auto-vectorizing the code in
> > pg_validate_utf8_sse42(). Rather, I was considering auto-vectorization
> > inside the individual helper functions that you wrote, such as
> > _mm_setr_epi8(), shift_right(), bitwise_and(), prev1(), splat(),
>
> If the PhD holders who came up with this algorithm thought it possible to do it that way, I'm sure they would have. In reality, simdjson has different files for SSE4, AVX, AVX512, NEON, and Altivec. We can incorporate any of those as needed. That's a PG15 project, though, and I'm not volunteering.

Just for fun/experimentation, here's a quick (and probably too naive)
translation of those helper functions to NEON, on top of the v15
patch.

Attachment Content-Type Size
0001-XXX-Make-SIMD-code-more-platform-neutral.txt text/plain 21.7 KB
0002-XXX-Add-ARM-NEON-support-for-UTF-8-validation.txt text/plain 6.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-07-21 15:58:02 Re: shared-memory based stats collector
Previous Message Robert Haas 2021-07-21 15:09:08 Re: refactoring basebackup.c