Re: Faster StrNCpy

From: mark(at)mark(dot)mielke(dot)cc
To: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Faster StrNCpy
Date: 2006-09-29 21:23:31
Message-ID: 20060929212331.GB30048@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

If anybody is curious, here are my numbers for an AMD X2 3800+:

$ gcc -O3 -std=c99 -DSTRING='"This is a very long sentence that is expected to be slow."' -o x x.c y.c strlcpy.c ; ./x
NONE: 620268 us
MEMCPY: 683135 us
STRNCPY: 7952930 us
STRLCPY: 10042364 us

$ gcc -O3 -std=c99 -DSTRING='"Short sentence."' -o x x.c y.c strlcpy.c ; ./x
NONE: 554694 us
MEMCPY: 691390 us
STRNCPY: 7759933 us
STRLCPY: 3710627 us

$ gcc -O3 -std=c99 -DSTRING='""' -o x x.c y.c strlcpy.c ; ./x
NONE: 631266 us
MEMCPY: 775340 us
STRNCPY: 7789267 us
STRLCPY: 550430 us

Each invocation represents 100 million calls to each of the functions.
Each function accepts a 'dst' and 'src' argument, and assumes that it
is copying 64 bytes from 'src' to 'dst'. The none function does
nothing. The memcpy calls memcpy(), the strncpy calls strncpy(), and
the strlcpy calls the strlcpy() that was posted from the BSD sources.
(GLIBC doesn't have strlcpy() on my machine).

This makes it clear what the overhead of the additional logic involves.
memcpy() is approximately equal to nothing at all. strncpy() is always
expensive. strlcpy() is often more expensive than memcpy(), except in
the empty string case.

These tests do not properly model the effects of real memory, however,
they do model the effects of cache memory. I would suggest that the
results are exaggerated, but not invalid.

For anybody doubting the none vs memcpy, I've included the generated
assembly code. I chalk it entirely up to fully utilizing the
parallelization capability of the CPU. Although 16 movq instructions
are executed, they can be executed fully in parallel.

It almost makes it clear to me that all of these instructions are
pretty fast. Are we sure this is a real bottleneck? Even the slowest
operation above, strlcpy() on a very long string, appears to execute
10 per microsecond? Perhaps my tests are too easy for my CPU and I
need to make it access many different 64-byte blocks? :-)

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

Attachment Content-Type Size
x.s text/plain 1.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-09-29 21:34:30 Re: Faster StrNCpy
Previous Message mark 2006-09-29 20:42:39 Re: Faster StrNCpy

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2006-09-29 21:34:30 Re: Faster StrNCpy
Previous Message mark 2006-09-29 20:42:39 Re: Faster StrNCpy