use SSE2 for is_valid_ascii

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Jelte Fennema <me(at)jeltef(dot)nl>
Subject: use SSE2 for is_valid_ascii
Date: 2022-08-10 06:50:14
Message-ID: CAFBsxsG=k8t=C457FXnoBXb=8iA4OaZkbFogFMachWif7mNnww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

new thread [was: WIP Patch: Add a function that returns binary JSONB as a bytea]

> I wrote:
> > We can also shave a
> > few percent by having pg_utf8_verifystr use SSE2 for the ascii path. I
> > can look into this.
>
> Here's a patch for that. If the input is mostly ascii, I'd expect that
> part of the flame graph to shrink by 40-50% and give a small boost
> overall.

Here is an updated patch using the new USE_SSE2 symbol. The style is
different from the last one in that each stanza has platform-specific
code. I wanted to try it this way because is_valid_ascii() is already
written in SIMD-ish style using general purpose registers and bit
twiddling, so it seemed natural to see the two side-by-side. Sometimes
they can share the same comment. If we think this is bad for
readability, I can go back to one block each, but that way leads to
duplication of code and it's difficult to see what's different for
each platform, IMO.

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v2-0001-Use-SSE2-in-is_valid_ascii-where-available.patch application/x-patch 6.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2022-08-10 07:14:54 Re: Avoid erroring out when unable to remove or parse logical rewrite files to save checkpoint work
Previous Message David Rowley 2022-08-10 06:33:49 Re: Reducing the chunk header sizes on all memory context types