Re: [POC] verifying UTF-8 using SIMD instructions

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] verifying UTF-8 using SIMD instructions
Date: 2021-03-19 19:24:06
Message-ID: CAFBsxsGkjcpFmqVNmE+T8AUV8XJNMsU+LOzu_HveQLvA5zjc6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:

> Thanks for testing! Good, the speedup is about as much as I can hope for
using plain C. In the next patch I'll go ahead and squash in the ascii fast
path, using 16-byte stride, unless there are objections. I claim we can
live with the regression Heikki found on an old 32-bit Arm platform since
it doesn't seem to be true of Arm in general.

In v8, I've squashed the 16-byte stride into 0002. I also removed the sole
holdout of hard-coded intrinsics, by putting _mm_setr_epi8 inside a
variadic macro, and also did some reordering of the one-line function
definitions. (As before, 0001 is not my patch, but parts of it are a
prerequisite to my regressions tests).

Over in [1] , I tested in-situ in a COPY FROM test and found a 10% speedup
with mixed ascii and multibyte in the copy code, i.e. with buffer and
storage taken completely out of the picture.

[1]
https://www.postgresql.org/message-id/CAFBsxsEybzagsrmuoLsKYx417Sce9cgnM91nf8f9HKGLadixPg%40mail.gmail.com
--
John Naylor
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v8-0001-Add-noError-argument-to-encoding-conversion-funct.patch application/octet-stream 225.0 KB
v8-0002-Replace-pg_utf8_verifystr-with-two-faster-impleme.patch application/octet-stream 49.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-03-19 19:25:45 Re: [PATCH] ProcessInterrupts_hook
Previous Message Tom Lane 2021-03-19 19:15:52 Re: Do we work with LLVM 12 on s390x?