Re: [POC] verifying UTF-8 using SIMD instructions

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] verifying UTF-8 using SIMD instructions
Date: 2021-07-21 18:16:38
Message-ID: CAFBsxsEB8JcucixOam7X0PoZzQsPzY+p+u071QEHu+NySOhcrA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 21, 2021 at 11:29 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
wrote:

> Just for fun/experimentation, here's a quick (and probably too naive)
> translation of those helper functions to NEON, on top of the v15
> patch.

Neat! It's good to make it more architecture-agnostic, and I'm sure we can
use quite a bit of this. I don't know enough about NEON to comment
intelligently, but a quick glance through the simdjson source show a couple
differences that might be worth a look:

to_bool(const pg_u8x16_t v)
{
+#if defined(USE_NEON)
+ return vmaxvq_u32((uint32x4_t) v) != 0;

--> return vmaxvq_u8(*this) != 0;

vzero()
{
+#if defined(USE_NEON)
+ return vmovq_n_u8(0);

--> return vdupq_n_u8(0); // or equivalently, splat(0)

is_highbit_set(const pg_u8x16_t v)
{
+#if defined(USE_NEON)
+ return to_bool(bitwise_and(v, vmovq_n_u8(0x80)));

--> return vmaxq_u8(v) > 0x7F

(Technically, their convention is: is_ascii(v) { return vmaxq_u8(v) < 0x80;
} , but same effect)

+#if defined(USE_NEON)
+static pg_attribute_always_inline pg_u8x16_t
+vset(uint8 v0, uint8 v1, uint8 v2, uint8 v3,
+ uint8 v4, uint8 v5, uint8 v6, uint8 v7,
+ uint8 v8, uint8 v9, uint8 v10, uint8 v11,
+ uint8 v12, uint8 v13, uint8 v14, uint8 v15)
+{
+ uint8 pg_attribute_aligned(16) values[16] = {
+ v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15
+ };
+ return vld1q_u8(values);
+}

--> They have this strange beast instead:

// Doing a load like so end ups generating worse code.
// uint8_t array[16] = {x1, x2, x3, x4, x5, x6, x7, x8,
// x9, x10,x11,x12,x13,x14,x15,x16};
// return vld1q_u8(array);
uint8x16_t x{};
// incredibly, Visual Studio does not allow x[0] = x1
x = vsetq_lane_u8(x1, x, 0);
x = vsetq_lane_u8(x2, x, 1);
x = vsetq_lane_u8(x3, x, 2);
...
x = vsetq_lane_u8(x15, x, 14);
x = vsetq_lane_u8(x16, x, 15);
return x;

Since you aligned the array, that might not have the problem alluded to
above, and it looks nicer.

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2021-07-21 18:25:14 Re: Git revision in tarballs
Previous Message Bryn Llewellyn 2021-07-21 17:44:08 Re: Have I found an interval arithmetic bug?