Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)

From: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)
Date: 2022-05-19 00:04:23
Message-ID: CAEudQArJk9fOm0sSLL8UOoaRKn6y59aKiA_NOM9qef0AXRZFqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Em qua., 18 de mai. de 2022 às 19:57, David Rowley <dgrowleyml(at)gmail(dot)com>
escreveu:

> On Thu, 19 May 2022 at 02:08, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> wrote:
> > That would initialize the content at compilation and not at runtime,
> correct?
>
> Your mental model of compilation and run-time might be flawed here.
> Here's no such thing as zeroing memory at compile time. There's only
> emitting instructions that perform those tasks at run-time.
> https://godbolt.org/ might help your understanding.
>
> > There are a lot of cases using MemSet (with struct variables) and at
> Windows 64 bits, long are 4 (four) bytes.
> > So I believe that MemSet is less efficient on Windows than on Linux.
> > "The size of the '_vstart' buffer is not a multiple of the element size
> of the type 'long'."
> > message from PVS-Studio static analysis tool.
>
> I've been wondering for a while if we really need to have the MemSet()
> macro. I see it was added in 8cb415449 (1997). I think compilers have
> evolved quite a bit in the past 25 years, so it could be time to
> revisit that.
>
> Your comment on the sizeof(long) on win64 is certainly true. I wrote
> the attached C program to test the performance difference.
>
> (windows 64-bit)
> >cl memset.c /Ox
> >memset 200000000
> Running 200000000 loops
> MemSet: size 8: 1.833000 seconds
> MemSet: size 16: 1.841000 seconds
> MemSet: size 32: 1.838000 seconds
> MemSet: size 64: 1.851000 seconds
> MemSet: size 128: 3.228000 seconds
> MemSet: size 256: 5.278000 seconds
> MemSet: size 512: 3.943000 seconds
> memset: size 8: 0.065000 seconds
> memset: size 16: 0.131000 seconds
> memset: size 32: 0.262000 seconds
> memset: size 64: 0.530000 seconds
> memset: size 128: 1.169000 seconds
> memset: size 256: 2.950000 seconds
> memset: size 512: 3.191000 seconds
>
> It seems like there's no cases there where MemSet is faster than
> memset. I was careful to only provide MemSet() with inputs that
> result in it not using the memset fallback. I also provided constants
> so that the decision about which method to use was known at compile
> time.
>
> It's not clear to me why 512 is faster than 256. I saw the same on a
> repeat run.
>
> Changing "long" to "long long" it looks like:
>
> >memset 200000000
> Running 200000000 loops
> MemSet: size 8: 0.066000 seconds
> MemSet: size 16: 1.978000 seconds
> MemSet: size 32: 1.982000 seconds
> MemSet: size 64: 1.973000 seconds
> MemSet: size 128: 1.970000 seconds
> MemSet: size 256: 3.225000 seconds
> MemSet: size 512: 5.366000 seconds
> memset: size 8: 0.069000 seconds
> memset: size 16: 0.132000 seconds
> memset: size 32: 0.265000 seconds
> memset: size 64: 0.527000 seconds
> memset: size 128: 1.161000 seconds
> memset: size 256: 2.976000 seconds
> memset: size 512: 3.179000 seconds
>
> The situation is a little different on my Linux machine:
>
> $ gcc memset.c -o memset -O2
> $ ./memset 200000000
> Running 200000000 loops
> MemSet: size 8: 0.000002 seconds
> MemSet: size 16: 0.000000 seconds
> MemSet: size 32: 0.094041 seconds
> MemSet: size 64: 0.184618 seconds
> MemSet: size 128: 1.781503 seconds
> MemSet: size 256: 2.547910 seconds
> MemSet: size 512: 4.005173 seconds
> memset: size 8: 0.046156 seconds
> memset: size 16: 0.046123 seconds
> memset: size 32: 0.092291 seconds
> memset: size 64: 0.184509 seconds
> memset: size 128: 1.781518 seconds
> memset: size 256: 2.577104 seconds
> memset: size 512: 4.004757 seconds
>
> It looks like part of the work might be getting optimised away in the
> 8-16 MemSet() calls.
>
> clang seems to have the opposite for size 8.
>
> $ clang memset.c -o memset -O2
> $ ./memset 200000000
> Running 200000000 loops
> MemSet: size 8: 0.007653 seconds
> MemSet: size 16: 0.005771 seconds
> MemSet: size 32: 0.011539 seconds
> MemSet: size 64: 0.023095 seconds
> MemSet: size 128: 0.046130 seconds
> MemSet: size 256: 0.092269 seconds
> MemSet: size 512: 0.968564 seconds
> memset: size 8: 0.000000 seconds
> memset: size 16: 0.005776 seconds
> memset: size 32: 0.011559 seconds
> memset: size 64: 0.023069 seconds
> memset: size 128: 0.046129 seconds
> memset: size 256: 0.092243 seconds
> memset: size 512: 0.968534 seconds
>
The results from clang, only reinforce the argument in favor of native
memset.
There is still room for gcc to improve with 8/16 bytes and for sure at some
point they will.
Which will make memset faster on all platforms and compilers.

regards,
Ranier Vilela

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-05-19 00:32:40 Re: PostgreSQL 15 Beta 1 release announcement draft
Previous Message Ranier Vilela 2022-05-18 23:51:01 Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)