Quick Links

Re: use SSE2 for is_valid_ascii

From:	Nathan Bossart <nathandbossart(at)gmail(dot)com>
To:	John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Jelte Fennema <me(at)jeltef(dot)nl>
Subject:	Re: use SSE2 for is_valid_ascii
Date:	2022-08-11 05:35:30
Message-ID:	20220811053530.GB1610687@nathanxps13
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Aug 11, 2022 at 11:10:34AM +0700, John Naylor wrote:
>> I wonder if reusing a zero vector (instead of creating a new one every
>> time) has any noticeable effect on performance.
>
> Creating a zeroed register is just FOO PXOR FOO, which should get
> hoisted out of the (unrolled in this case) loop, and which a recent
> CPU will just map to a hard-coded zero in the register file, in which
> case the execution latency is 0 cycles. :-)

Ah, indeed. At -O2, my compiler seems to zero out two registers before the
loop with either approach:

pxor %xmm0, %xmm0 ; accumulator
pxor %xmm2, %xmm2 ; always zeros

And within the loop, I see the following:

movdqu (%rdi), %xmm1
movdqu (%rdi), %xmm3
addq $16, %rdi
pcmpeqb %xmm2, %xmm1 ; check for zeros
por %xmm3, %xmm0 ; OR data into accumulator
por %xmm1, %xmm0 ; OR zero check results into accumulator
cmpq %rdi, %rsi

So the call to _mm_setzero_si128() within the loop is fine. Apologies for
the noise.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

In response to

Re: use SSE2 for is_valid_ascii at 2022-08-11 04:10:34 from John Naylor

Responses

Re: use SSE2 for is_valid_ascii at 2022-08-25 09:41:53 from John Naylor

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Sergey Dudoladov	2022-08-11 05:42:04	Re: Stats collector's idx_blks_hit value is highly misleading in practice
Previous Message	Dilip Kumar	2022-08-11 05:31:51	Re: SUBTRANS: Minimizing calls to SubTransSetParent()