Quick Links

Re: Faster StrNCpy

From:	mark(at)mark(dot)mielke(dot)cc
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Faster StrNCpy
Date:	2006-09-29 21:59:17
Message-ID:	20060929215917.GC30048@mark.mielke.cc
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

On Fri, Sep 29, 2006 at 05:34:30PM -0400, Tom Lane wrote:
> mark(at)mark(dot)mielke(dot)cc writes:
> > If anybody is curious, here are my numbers for an AMD X2 3800+:
> You did not show your C code, so no one else can reproduce the test on
> other hardware. However, it looks like your compiler has unrolled the
> memcpy into straight-line 8-byte moves, which makes it pretty hard for
> anything operating byte-wise to compete, and is a bit dubious for the
> general case anyway (since it requires assuming that the size and
> alignment are known at compile time).

I did show the .s code. I call into x_memcpy(a, b), meaning that the
compiler can't assume anything. It may happen to be aligned.

Here are results over 64 Mbytes of memory, to ensure that every call is
a cache miss:

$ gcc -O3 -std=c99 -DSTRING='"This is a very long sentence that is expected to be very slow."' -DN="(1024*1024)" -o x x.c y.c strlcpy.c ; ./x
NONE: 767243 us
MEMCPY: 6044137 us
STRNCPY: 10741759 us
STRLCPY: 12061630 us
LENCPY: 9459099 us

$ gcc -O3 -std=c99 -DSTRING='"Short sentence."' -DN="(1024*1024)" -o x x.c y.c strlcpy.c ; ./x
NONE: 712193 us
MEMCPY: 6072312 us
STRNCPY: 9982983 us
STRLCPY: 6605052 us
LENCPY: 7128258 us

$ gcc -O3 -std=c99 -DSTRING='""' -DN="(1024*1024)" -o x x.c y.c strlcpy.c ; ./x NONE: 708164 us
MEMCPY: 6042817 us
STRNCPY: 8885791 us
STRLCPY: 5592477 us
LENCPY: 6135550 us

At least on my machine, memcpy() still comes out on top. Yes, assuming that
it is aligned correctly for the machine. Here is unaliagned (all arrays are
stored +1 offset in memory):

$ gcc -O3 -std=c99 -DSTRING='"This is a very long sentence that is expected to be very slow."' -DN="(1024*1024)" -DALIGN=1 -o x x.c y.c strlcpy.c ; ./x
NONE: 790932 us
MEMCPY: 6591559 us
STRNCPY: 10622291 us
STRLCPY: 12070007 us
LENCPY: 10322541 us

$ gcc -O3 -std=c99 -DSTRING='"Short sentence."' -DN="(1024*1024)" -DALIGN=1 -o x x.c y.c strlcpy.c ; ./x
NONE: 764577 us
MEMCPY: 6631731 us
STRNCPY: 9513540 us
STRLCPY: 6615345 us
LENCPY: 7263392 us

$ gcc -O3 -std=c99 -DSTRING='""' -DN="(1024*1024)" -DALIGN=1 -o x x.c y.c strlcpy.c ; ./x
NONE: 825689 us
MEMCPY: 6607777 us
STRNCPY: 8976487 us
STRLCPY: 5878088 us
LENCPY: 6180358 us

Alignment looks like it does impact the results for memcpy(). memcpy()
changes from around 6.0 seconds to 6.6 seconds. Overall, though, it is
still the winner in all cases accept for strlcpy(), which beats it on
very short strings ("").

Here is the cache hit case including your strlen+memcpy as 'LENCPY':

$ gcc -O3 -std=c99 -DSTRING='"This is a very long sentence that is expected to be very slow."' -DN=1 -o x x.c y.c strlcpy.c ; ./x
NONE: 696157 us
MEMCPY: 825118 us
STRNCPY: 7983159 us
STRLCPY: 10787462 us
LENCPY: 6048339 us

$ gcc -O3 -std=c99 -DSTRING='"Short sentence."' -DN=1 -o x x.c y.c strlcpy.c ; ./x
NONE: 700201 us
MEMCPY: 593701 us
STRNCPY: 7577380 us
STRLCPY: 3727801 us
LENCPY: 3169783 us

$ gcc -O3 -std=c99 -DSTRING='""' -DN=1 -o x x.c y.c strlcpy.c ; ./x
NONE: 706283 us
MEMCPY: 792719 us
STRNCPY: 7870425 us
STRLCPY: 681334 us
LENCPY: 2062983 us

First call was every call being a cache hit. With this one, every one is
a cache miss, and the 64-byte blocks are spread equally over 64 Mbytes of
memory. I've attached the code for your consideration. x.c is the routines
I used to perform the tests. y.c is the main program. strlcpy.c is copied
from the online reference as is without change. The compilation steps
are described above. STRING is the string to try out. N is the number
of 64-byte blocks to allocate. ALIGN is the number of bytes to offset
the array by when storing / reading / writing. ALIGN should be >= 0.

At N=1, it's all in cache. At N=1024*1024 it is taking up 64 Mbytes of
RAM.

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

Attachment	Content-Type	Size
x.c	text/plain	580 bytes
y.c	text/plain	2.1 KB
strlcpy.c	text/plain	1.8 KB

In response to

Re: Faster StrNCpy at 2006-09-29 21:34:30 from Tom Lane

Responses

Re: Faster StrNCpy at 2006-10-02 16:06:35 from Strong, David
Re: Faster StrNCpy at 2006-10-02 18:30:11 from Tom Lane
Re: Faster StrNCpy at 2006-10-02 18:39:49 from Luke Lonergan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Fetter	2006-09-29 22:11:58	Re: Per-database search_path
Previous Message	Tom Lane	2006-09-29 21:55:23	Re: Win32 hard crash problem

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Bruce Momjian	2006-09-30 12:09:40	strlcpy() and bsd/os
Previous Message	Tom Lane	2006-09-29 21:34:30	Re: Faster StrNCpy