From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | Ants Aasma <ants(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: add AVX2 support to simd.h |
Date: | 2024-03-25 21:37:54 |
Message-ID: | 20240325213754.GA3094030@nathanxps13 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Here is what I have staged for commit. One notable difference in this
version of the patch is that I've changed
+ if (nelem <= nelem_per_iteration)
+ goto one_by_one;
to
+ if (nelem < nelem_per_iteration)
+ goto one_by_one;
I realized that there's no reason to jump to the one-by-one linear search
code when nelem == nelem_per_iteration, as the worst thing that will happen
is that we'll process all the elements twice if the value isn't present in
the array. My benchmark that I've been using also shows a significant
speedup for this case with this change (on the order of 75%), which I
imagine might be due to a combination of branch prediction, caching, fewer
instructions, etc.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachment | Content-Type | Size |
---|---|---|
v9-0001-Micro-optimize-pg_lfind32.patch | text/x-diff | 5.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2024-03-25 21:44:08 | Re: Add bump memory context type and use it for tuplesorts |
Previous Message | Melanie Plageman | 2024-03-25 21:11:20 | Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE |