| From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> | 
|---|---|
| To: | John Naylor <johncnaylorls(at)gmail(dot)com> | 
| Cc: | Ants Aasma <ants(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org | 
| Subject: | Re: add AVX2 support to simd.h | 
| Date: | 2024-03-25 21:37:54 | 
| Message-ID: | 20240325213754.GA3094030@nathanxps13 | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Here is what I have staged for commit.  One notable difference in this
version of the patch is that I've changed
    +	if (nelem <= nelem_per_iteration)
    +		goto one_by_one;
to
    +	if (nelem < nelem_per_iteration)
    +		goto one_by_one;
I realized that there's no reason to jump to the one-by-one linear search
code when nelem == nelem_per_iteration, as the worst thing that will happen
is that we'll process all the elements twice if the value isn't present in
the array.  My benchmark that I've been using also shows a significant
speedup for this case with this change (on the order of 75%), which I
imagine might be due to a combination of branch prediction, caching, fewer
instructions, etc.
-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
| Attachment | Content-Type | Size | 
|---|---|---|
| v9-0001-Micro-optimize-pg_lfind32.patch | text/x-diff | 5.4 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2024-03-25 21:44:08 | Re: Add bump memory context type and use it for tuplesorts | 
| Previous Message | Melanie Plageman | 2024-03-25 21:11:20 | Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE |