Re: Change GUC hashtable to use simplehash?

From: Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gurjeet Singh <gurjeet(at)singh(dot)im>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Change GUC hashtable to use simplehash?
Date: 2024-01-30 12:51:24
Message-ID: CANwKhkO364C3moZk_m+MEy+ryTB8ehh-Sh-EqLg3Uc94y2P3ow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 30 Jan 2024 at 12:04, John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> On Tue, Jan 30, 2024 at 4:13 AM Ants Aasma <ants(dot)aasma(at)cybertec(dot)at> wrote:
> > But given that we know the data length and we have it in a register
> > already, it's easy enough to just mask out data past the end with a
> > shift. See patch 1. Performance benefit is about 1.5x Measured on a
> > small test harness that just hashes and finalizes an array of strings,
> > with a data dependency between consecutive hashes (next address
> > depends on the previous hash output).
>
> Interesting work! I've taken this idea and (I'm guessing, haven't
> tested) improved it by re-using an intermediate step for the
> conditional, simplifying the creation of the mask, and moving the
> bitscan out of the longest dependency chain. Since you didn't attach
> the test harness, would you like to run this and see how it fares?
> (v16-0001 is same as your 0001, and v16-0002 builds upon it.) I plan
> to test myself as well, but since your test tries to model true
> latency, I'm more interested in that one.

It didn't calculate the same result because the if (mask) condition
was incorrect. Changed it to if (chunk & 0xFF) and removed the right
shift from the mask. It seems to be half a nanosecond faster, but as I
don't have a machine set up for microbenchmarking it's quite close to
measurement noise.

I didn't post the harness as it's currently so messy to be near
useless to others. But if you'd like to play around, I can tidy it up
a bit and post it.

> > Not sure if the second one is worth the extra code.
>
> I'd say it's not worth optimizing the case we think won't be taken
> anyway. I also like having a simple path to assert against.

Agreed.

As an addendum, I couldn't resist trying out using 256bit vectors with
two parallel AES hashes running, unaligned loads with special casing
page boundary straddling loads. Requires -march=x86-64-v3 -maes. About
20% faster than fasthash on short strings, 2.2x faster on 4k strings.
Right now requires 4 bytes alignment (uses vpmaskmovd), but could be
made to work with any alignment.

Regards,
Ants Aasma

Attachment Content-Type Size
simd-hash-avx2-aesni.c text/x-csrc 2.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2024-01-30 12:52:54 Re: Synchronizing slots from primary to standby
Previous Message Pavel Stehule 2024-01-30 12:51:17 Re: Bytea PL/Perl transform