Re: [POC] verifying UTF-8 using SIMD instructions

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] verifying UTF-8 using SIMD instructions
Date: 2021-02-07 20:24:16
Message-ID: CAFBsxsHWAy+GS39rEbsczLb-3H1=P_93urv-85K0R7dUQfajwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here is a more polished version of the function pointer approach, now
adapted to all multibyte encodings. Using the not-yet-committed tests from
[1], I found a thinko bug that resulted in the test for nul bytes to not
only be wrong, but probably also elided by the compiler. Doing it correctly
is noticeably slower on pure ascii, but still several times faster than
before, so the conclusions haven't changed any. I'll run full measurements
later this week, but I'll share the patch now for review.

[1]
https://www.postgresql.org/message-id/11d39e63-b80a-5f8d-8043-fff04201fadc@iki.fi

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v1-0001-Add-an-ASCII-fast-path-to-multibyte-encoding-veri.patch application/octet-stream 7.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-02-07 21:11:16 Re: [HACKERS] GSoC 2017: Foreign Key Arrays
Previous Message David G. Johnston 2021-02-07 19:09:42 Re: jsonb_array_elements_recursive()