From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: SP-GiST micro-optimizations |
Date: | 2012-08-28 18:27:18 |
Message-ID: | 503D0D86.6080105@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 28.08.2012 20:30, Tom Lane wrote:
> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> Drilling into the profile, I came up with three little optimizations:
>
>> 1. Within spgdoinsert, a significant portion of the CPU time is spent on
>> line 2033 in spgdoinsert.c:
>
>> memset(&out, 0, sizeof(out));
>
>> That zeroes out a small struct allocated in the stack. Replacing that
>> with MemSet() makes it faster, reducing the time spent on zeroing that
>> struct from 10% to 1.5% of the time spent in spgdoinsert(). That's not
>> very much in the big scheme of things, but it's a trivial change so
>> seems worth it.
>
> Fascinating. I'd been of the opinion that modern compilers would inline
> memset() for themselves and MemSet was probably not better than what the
> compiler could do these days. What platform are you testing on?
x64, gcc 4.7.1, running Debian.
The assembly generated for the MemSet is:
.loc 1 2033 0 discriminator 3
movq $0, -432(%rbp)
.LVL166:
movq $0, -424(%rbp)
.LVL167:
movq $0, -416(%rbp)
.LVL168:
movq $0, -408(%rbp)
.LVL169:
movq $0, -400(%rbp)
.LVL170:
movq $0, -392(%rbp)
while the corresponding memset code is:
.loc 1 2040 0 discriminator 6
xorl %eax, %eax
.loc 1 2042 0 discriminator 6
cmpb $0, -669(%rbp)
.loc 1 2040 0 discriminator 6
movq -584(%rbp), %rdi
movl $6, %ecx
rep stosq
In fact, with -mstringop=unrolled_loop, I can coerce gcc to produce code
similar to the MemSet version:
movq %rax, -440(%rbp)
.loc 1 2040 0 discriminator 6
xorl %eax, %eax
.L254:
movl %eax, %edx
addl $32, %eax
cmpl $32, %eax
movq $0, -432(%rbp,%rdx)
movq $0, -424(%rbp,%rdx)
movq $0, -416(%rbp,%rdx)
movq $0, -408(%rbp,%rdx)
jb .L254
leaq -432(%rbp), %r9
addq %r9, %rax
.loc 1 2042 0 discriminator 6
cmpb $0, -665(%rbp)
.loc 1 2040 0 discriminator 6
movq $0, (%rax)
movq $0, 8(%rax)
I'm not sure why gcc doesn't choose that by default. Perhaps it's CPU
specific which variant is faster - I was quite surprised that MemSet was
such a clear win on my laptop. Or maybe it's a speed-space tradeoff, and
gcc chooses the more compact version, although using -O3 instead of -O2
made no difference.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2012-08-28 18:36:35 | Re: MySQL search query is not executing in Postgres DB |
Previous Message | Stephen Frost | 2012-08-28 18:12:32 | Re: "default deny" for roles |