From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Jelte Fennema <me(at)jeltef(dot)nl> |
Subject: | Re: use SSE2 for is_valid_ascii |
Date: | 2022-08-10 22:31:20 |
Message-ID: | 20220810223120.GA1553157@nathanxps13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Aug 10, 2022 at 01:50:14PM +0700, John Naylor wrote:
> Here is an updated patch using the new USE_SSE2 symbol. The style is
> different from the last one in that each stanza has platform-specific
> code. I wanted to try it this way because is_valid_ascii() is already
> written in SIMD-ish style using general purpose registers and bit
> twiddling, so it seemed natural to see the two side-by-side. Sometimes
> they can share the same comment. If we think this is bad for
> readability, I can go back to one block each, but that way leads to
> duplication of code and it's difficult to see what's different for
> each platform, IMO.
This is a neat patch. I don't know that we need an entirely separate code
block for the USE_SSE2 path, but I do think that a little bit of extra
commentary would improve the readability. IMO the existing comment for the
zero accumulator has the right amount of detail.
+ /*
+ * Set all bits in each lane of the error accumulator where input
+ * bytes are zero.
+ */
+ error_cum = _mm_or_si128(error_cum,
+ _mm_cmpeq_epi8(chunk, _mm_setzero_si128()));
I wonder if reusing a zero vector (instead of creating a new one every
time) has any noticeable effect on performance.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2022-08-11 00:20:12 | Re: [RFC] building postgres with meson - v11 |
Previous Message | Andres Freund | 2022-08-10 22:26:25 | Re: shared-memory based stats collector - v70 |