| From: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
|---|---|
| To: | Bryan Green <dbryan(dot)green(at)gmail(dot)com> |
| Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Avoid multiple calls to memcpy (src/backend/access/index/genam.c) |
| Date: | 2026-03-12 19:27:13 |
| Message-ID: | CAEudQArY+Kb0EjL1EwdbSecerJ4DsH=ywcBA7_X7eVDrEwdVWQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Em qui., 12 de mar. de 2026 às 16:21, Bryan Green <dbryan(dot)green(at)gmail(dot)com>
escreveu:
> I modified your memcpy1.c program to not inline the version functions. I
> changed the memcpy function
> call in version 1, added volatile to keep some DCE opportunities from
> happening and added a range
> of N values to keep the compiler from specializing the code for N = 4.
> Before it did DCE and the test1
> function was just a ret.
>
> The interesting issue is the use of malloc versus the stack. The use of
> malloc will probably track closer
> with PG's use of palloc so I would say in that case this is an
> optimization. It might be fun to compile PG
> with and without the patch (in debug mode) and actually see what gets
> generated for this function.
>
> Here are the results I got using your modified benchmark:
> --- stack allocated ---
> stack n=1 v1(patch): 49721599 ns v2(original): 21477302 ns ratio:
> 2.315 original wins
> stack n=2 v1(patch): 52065462 ns v2(original): 28765199 ns ratio:
> 1.810 original wins
> stack n=3 v1(patch): 58914958 ns v2(original): 39726110 ns ratio:
> 1.483 original wins
> stack n=4 v1(patch): 64585275 ns v2(original): 47046397 ns ratio:
> 1.373 original wins
> stack n=5 v1(patch): 73929844 ns v2(original): 58588698 ns ratio:
> 1.262 original wins
> stack n=6 v1(patch): 95465376 ns v2(original): 67807817 ns ratio:
> 1.408 original wins
> stack n=7 v1(patch): 86910226 ns v2(original): 76999488 ns ratio:
> 1.129 original wins
> stack n=8 v1(patch): 107765417 ns v2(original): 86046016 ns ratio:
> 1.252 original wins
>
> --- malloc allocated ---
> malloc n=1 v1(patch): 133283824 ns v2(original): 141361091 ns ratio:
> 0.943 patch wins
> malloc n=2 v1(patch): 145625895 ns v2(original): 180912711 ns ratio:
> 0.805 patch wins
> malloc n=3 v1(patch): 153975594 ns v2(original): 228459879 ns ratio:
> 0.674 patch wins
> malloc n=4 v1(patch): 154483094 ns v2(original): 248157408 ns ratio:
> 0.623 patch wins
> malloc n=5 v1(patch): 157710598 ns v2(original): 298795018 ns ratio:
> 0.528 patch wins
> malloc n=6 v1(patch): 165196636 ns v2(original): 332940132 ns ratio:
> 0.496 patch wins
> malloc n=7 v1(patch): 169576370 ns v2(original): 358438778 ns ratio:
> 0.473 patch wins
> malloc n=8 v1(patch): 184463815 ns v2(original): 403721513 ns ratio:
> 0.457 patch wins
>
Thanks for your attention and tests.
I think that patch can continue then.
best regards,
Ranier Vilela
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tomas Vondra | 2026-03-12 19:27:32 | Re: Why clearing the VM doesn't require registering vm buffer in wal record |
| Previous Message | Heikki Linnakangas | 2026-03-12 19:21:19 | Re: Better shared data structure management and resizable shared data structures |