Re: Auto-vectorization speeds up multiplication of large-precision numerics

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Auto-vectorization speeds up multiplication of large-precision numerics
Date: 2020-09-08 05:50:06
Message-ID: CAJ3gD9e=X=oC+R2n7istwZ3Qfh3EHsQ=c2iH8uGyRoAujH=4Sw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 8 Sep 2020 at 02:19, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I wrote:
> > I experimented with a few different ideas such as adding restrict
> > decoration to the pointers, and eventually found that what works
> > is to write the loop termination condition as "i2 < limit"
> > rather than "i2 <= limit". It took me a long time to think of
> > trying that, because it seemed ridiculously stupid. But it works.

Ah ok.

I checked the "Auto-Vectorization in LLVM" link that you shared. All
the examples use "< n" or "> n". None of them use "<= n". Looks like a
hidden restriction.

>
> I've done more testing and confirmed that both gcc and clang can
> vectorize the improved loop on aarch64 as well as x86_64. (clang's
> results can be confusing because -ftree-vectorize doesn't seem to
> have any effect: its vectorizer is on by default. But if you use
> -fno-vectorize it'll go back to the old, slower code.)
>
> The only buildfarm effect I've noticed is that locust and
> prairiedog, which are using nearly the same ancient gcc version,
> complain
>
> c1: warning: -ftree-vectorize enables strict aliasing. -fno-strict-aliasing is ignored when Auto Vectorization is used.
>
> which is expected (they say the same for checksum.c), but then
> there are a bunch of
>
> warning: dereferencing type-punned pointer will break strict-aliasing rules
>
> which seems worrisome. (This sort of thing is the reason I'm
> hesitant to apply higher optimization levels across the board.)
> Both animals pass the regression tests anyway, but if any other
> compilers treat -ftree-vectorize as an excuse to apply stricter
> optimization assumptions, we could be in for trouble.
>
> I looked closer and saw that all of those warnings are about
> init_var(), and this change makes them go away:
>
> -#define init_var(v) MemSetAligned(v, 0, sizeof(NumericVar))
> +#define init_var(v) memset(v, 0, sizeof(NumericVar))
>
> I'm a little inclined to commit that as future-proofing. It's
> essentially reversing out a micro-optimization I made in d72f6c750.
> I doubt I had hard evidence that it made any noticeable difference;
> and even if it did back then, modern compilers probably prefer the
> memset approach.

Thanks. I must admit it did not occur to me that I could have very
well installed clang on my linux machine and tried compiling this
file, or tested with some older gcc versions. I think I was using gcc
8. Do you know what was the gcc compiler version that gave these
warnings ?

--
Thanks,
-Amit Khandekar
Huawei Technologies

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tsunakawa.takay@fujitsu.com 2020-09-08 06:01:52 RE: [Patch] Optimize dropping of relation buffers using dlist
Previous Message tsunakawa.takay@fujitsu.com 2020-09-08 05:49:21 RE: [Patch] Optimize dropping of relation buffers using dlist