Re: Proposal for enabling auto-vectorization for checksum calculations

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>
Cc: Andrew Kim <tenistarkim(at)gmail(dot)com>, Oleg Tselebrovskiy <o(dot)tselebrovskiy(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Proposal for enabling auto-vectorization for checksum calculations
Date: 2026-03-31 04:09:26
Message-ID: CANWCAZYrjnCCE6m=5oRs+Ok=sgMrdf33xM25Fxy3yp=kQAoNwA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 30, 2026 at 10:01 PM Ants Aasma <ants(dot)aasma(at)cybertec(dot)at> wrote:
>
> On Mon, 30 Mar 2026 at 15:01, John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
> > I don't remember the last time anyone did measurements, so I went
> > ahead and did that:
> >
> > master: 945ms
> > 32 AVX2: 335ms
> > 64 AVX2: 220ms
>
> I'm guessing this is on a recent Intel. Any extra width is helpful on Intel as they doubled vpmulld latency from under us after we had settled on this algorithm.

It's actually ancient and due to be replaced soon, but still several
years after the adoption of this algorithm.

> FWIW I think AVX2 (x86-64-v3) is fine.

Glad to hear it, although the patch doesn't use that build flag, so
it's not impossible there is some additional difference in the
compiler's model. Still, given the variation you found, I'll make sure
the commit message says "several time faster" so it's not specific to
my hardware.

--
John Naylor
Amazon Web Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2026-03-31 04:22:40 Re: Skipping schema changes in publication
Previous Message Masahiko Sawada 2026-03-31 04:08:52 Re: Initial COPY of Logical Replication is too slow