Re: tweaking MemSet() performance

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tweaking MemSet() performance
Date: 2002-08-29 19:37:26
Message-ID: 200208291937.g7TJbQC20180@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


I consider this a very good test. As you can see from the date of my
last test, 1997/09/11, I think I may have had a dual Pentium Pro at that
point, and hardware has certainly changed since then. I did try 128 at
that time and found it to be slower, but with newer hardware, it is very
possible it has improved.

I remember in writing that macro how surprised I was that there was any
improvements, but obviously there is a gain and the gain is getting
bigger.

I tested the following program:

#include <string.h>
#include "postgres.h"

#undef MEMSET_LOOP_LIMIT
#define MEMSET_LOOP_LIMIT 1000000

int
main(int argc, char **argv)
{
int len = atoi(argv[1]);
char buffer[len];
long long i;

for (i = 0; i < 9900000; i++)
MemSet(buffer, 0, len);
return 0;
}

and, yes, -O2 is significant! Looks like we use -O2 on all platforms
that use GCC so we should be OK there.

I tested with the following script:

for TIME in 64 128 256 512 1024 2048 4096; do echo "*$TIME\c";
time tst1 $TIME; done

and got for MemSet:

*64
real 0m1.001s
user 0m1.000s
sys 0m0.003s
*128
real 0m1.578s
user 0m1.567s
sys 0m0.013s
*256
real 0m2.723s
user 0m2.723s
sys 0m0.003s
*512
real 0m5.044s
user 0m5.029s
sys 0m0.013s
*1024
real 0m9.621s
user 0m9.621s
sys 0m0.003s
*2048
real 0m18.821s
user 0m18.811s
sys 0m0.013s
*4096
real 0m37.266s
user 0m37.266s
sys 0m0.003s

and for memset():

*64
real 0m1.813s
user 0m1.801s
sys 0m0.014s
*128
real 0m2.489s
user 0m2.499s
sys 0m0.994s
*256
real 0m4.397s
user 0m5.389s
sys 0m0.005s
*512
real 0m5.186s
user 0m6.170s
sys 0m0.015s
*1024
real 0m6.676s
user 0m6.676s
sys 0m0.003s
*2048
real 0m9.766s
user 0m9.776s
sys 0m0.994s
*4096
real 0m15.970s
user 0m15.954s
sys 0m0.003s

so for BSD/OS, the break-even is 512.

I am on a dual P3/550 using 2.95.2. I will tell you exactly why my
break-even is lower than most --- I have assembly language memset()
functions in libc on BSD/OS.

I suggest changing the MEMSET_LOOP_LIMIT to 512.

---------------------------------------------------------------------------

Neil Conway wrote:
> In include/c.h, MemSet() is defined to be different than the stock
> function memset() only when copying less than or equal to
> MEMSET_LOOP_LIMIT bytes (currently 64). The comments above the macro
> definition note:
>
> * We got the 64 number by testing this against the stock memset() on
> * BSD/OS 3.0. Larger values were slower. bjm 1997/09/11
> *
> * I think the crossover point could be a good deal higher for
> * most platforms, actually. tgl 2000-03-19
>
> I decided to investigate Tom's suggestion and determine the
> performance of MemSet() versus memset() on my machine, for various
> values of MEMSET_LOOP_LIMIT. The machine this is being tested on is a
> Pentium 4 1.8 Ghz with RDRAM, running Linux 2.4.19pre8 with GCC 3.1.1
> and glibc 2.2.5 -- the results may or may not apply to other
> machines.
>
> The test program was:
>
> #include <string.h>
> #include "postgres.h"
>
> #undef MEMSET_LOOP_LIMIT
> #define MEMSET_LOOP_LIMIT BUFFER_SIZE
>
> int
> main(void)
> {
> char buffer[BUFFER_SIZE];
> long long i;
>
> for (i = 0; i < 99000000; i++)
> {
> MemSet(buffer, 0, sizeof(buffer));
> }
>
> return 0;
> }
>
> (I manually changed MemSet() to memset() when testing the performance
> of the latter function.)
>
> It was compiled like so:
>
> gcc -O2 -DBUFFER_SIZE=xxx -Ipgsql/src/include memset.c
>
> (The -O2 optimization flag is important: the results are significantly
> different if it is not used.)
>
> Here are the results (each timing is the 'total' listing from 'time
> ./a.out'):
>
> BUFFER_SIZE = 64
> MemSet() -> 2.756, 2.810, 2.789
> memset() -> 13.844, 13.782, 13.778
>
> BUFFER_SIZE = 128
> MemSet() -> 5.848, 5.989, 5.861
> memset() -> 15.637, 15.631, 15.631
>
> BUFFER_SIZE = 256
> MemSet() -> 9.602, 9.652, 9.633
> memset() -> 19.305, 19.370, 19.302
>
> BUFFER_SIZE = 512
> MemSet() -> 17.416, 17.462, 17.353
> memset() -> 26.657, 26.658, 26.678
>
> BUFFER_SIZE = 1024
> MemSet() -> 32.144, 32.179, 32.086
> memset() -> 41.186, 41.115, 41.176
>
> BUFFER_SIZE = 2048
> MemSet() -> 60.39, 60.48, 60.32
> memset() -> 71.19, 71.18, 71.17
>
> BUFFER_SIZE = 4096
> MemSet() -> 118.29, 120.07, 118.69
> memset() -> 131.40, 131.41
>
> ... at which point I stopped benchmarking.
>
> Is the benchmark above a reasonable assessment of memset() / MemSet()
> performance when copying word-aligned amounts of memory? If so, what's
> a good value for MEMSET_LOOP_LIMIT (perhaps 512)?
>
> Also, if anyone would like to contribute the results of doing the
> benchmark on their particular system, that might provide some useful
> additional data points.
>
> Cheers,
>
> Neil
>
> --
> Neil Conway <neilc(at)samurai(dot)com> || PGP Key ID: DB3C29FC
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2002-08-29 19:37:29 Re: Type definition process (was Re: MemoryContextAlloc: invalid request size 1934906735)
Previous Message D'Arcy J.M. Cain 2002-08-29 19:18:11 Re: Type definition process (was Re: MemoryContextAlloc: invalid request size 1934906735)