Re: Auto-vectorization speeds up multiplication of large-precision numerics

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Auto-vectorization speeds up multiplication of large-precision numerics
Date: 2020-07-13 08:57:19
Message-ID: CAJ3gD9e+j+DT1pWZDEk3Ou56=qVThH4TeJUwrTYNGv2LD57uew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 10 Jul 2020 at 19:02, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> writes:
> > We normally don't compile with -O3, so very few users would get the
> > benefit of this.
>
> Yeah. I don't think changing that baseline globally would be a wise move.
>
> > We have CFLAGS_VECTOR for the checksum code. I
> > suppose if we are making the numeric code vectorizable as well, we
> > should apply this there also.
>
> > I think we need a bit of a policy decision from the group here.
>
> I'd vote in favor of applying CFLAGS_VECTOR to specific source files
> that can benefit. We already have experience with that and we've not
> detected any destabilization potential.

I tried this in utils/adt/Makefile :
+
+numeric.o: CFLAGS += ${CFLAGS_VECTOR}
+
and it works.

CFLAGS_VECTOR also includes the -funroll-loops option, which I
believe, had showed improvements in the checksum.c runs ( [1] ). This
option makes the object file a bit bigger. For numeric.o, it's size
increased by 15K; from 116672 to 131360 bytes. I ran the
multiplication test, and didn't see any additional speed-up with this
option. Also, it does not seem to be related to vectorization. So I
was thinking of splitting the CFLAGS_VECTOR into CFLAGS_VECTOR and
CFLAGS_UNROLL_LOOPS. Checksum.c can use both these flags, and
numeric.c can use only CFLAGS_VECTOR.

I was also wondering if it's worth to extract only the code that we
think can be optimized and keep it in a separate file (say
numeric_vectorize.c or adt_vectorize.c, which can have mul_var() to
start with), and use this file as a collection of all such code in the
adt module, and then we can add such files into other modules as and
when needed. For numeric.c, there can be already some scope for
auto-vectorizations in other similar regions in that file, so we don't
require a separate numeric_vectorize.c and just pass the CFLAGS_VECTOR
flag for this file itself.

[1] https://www.postgresql.org/message-id/flat/CA%2BU5nML8JYeGqM-k4eEwNJi5H%3DU57oPLBsBDoZUv4cfcmdnpUA%40mail.gmail.com#2ec419817ff429588dd1229fb663080e

--
Thanks,
-Amit Khandekar
Huawei Technologies

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-07-13 09:02:02 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message Pavel Stehule 2020-07-13 08:20:42 Re: proposal: possibility to read dumped table's name from file