Re: Some platform-specific MemSet research

From: "Rocco Altier" <RoccoA(at)Routescape(dot)com>
To: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Seneca Cunningham" <scunning(at)ca(dot)afilias(dot)info>
Cc: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some platform-specific MemSet research
Date: 2006-02-02 17:42:17
Message-ID: 6E0907A94904D94B99D7F387E08C4F57C62740@FALCON.INSIGHT
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

I wanted to chime in that I also see this speedup from using XLC 6.0
(IBM's cc), even in 32bit mode. I have tested on AIX 5.2 and 5.1.

I think this would be good to include in the regular release.

Not sure how many people are running older versions of AIX that would
want a new version of postgres.

-rocco

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Bruce Momjian
> Sent: Wednesday, February 01, 2006 12:11 PM
> To: Seneca Cunningham
> Cc: Martijn van Oosterhout; pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Some platform-specific MemSet research
>
>
>
> My guess is that there is some really fast assembler for
> memory copy on
> AIX, and only libc memset() has it. If you want, we can make
> MEMSET_LOOP_LIMIT in c.h a configure value, and allow template/aix to
> set it to zero, causing memset() to be always used.
>
> Are you prepared to make this optimization decision for all AIX users
> using gcc, or only for certain versions?
>
> --------------------------------------------------------------
> -------------
>
> Seneca Cunningham wrote:
> > Martijn van Oosterhout wrote:
> > > On Tue, Jan 24, 2006 at 05:24:28PM -0500, Seneca Cunningham wrote:
> > >
> > >>After reading the post on -patches proposing that MemSet
> be changed to
> > >>use long instead of int32 on the grounds that a pair of
> x86-64 linux
> > >>boxes took less time to execute the long code 64*10^6
> times[1], I took a
> > >>look at how the testcode performed on AIX with gcc.
> While the switch to
> > >>long did result in a minor performance improvement, dropping the
> > >>MemSetLoop in favour of the native memset resulted in the
> tests taking
> > >>~25% the time as the MemSetLoop-like int loop. The 32-bit
> linux system I
> > >>ran the expanded tests on showed that for the buffer size
> range that
> > >>postgres can use the looping MemSet instead of memset
> (size <= 1024
> > >>bytes), MemSet generally had better performance.
> > >
> > >
> > > Could you please check the asm output to see what's going
> on. We've had
> > > tests like these produce odd results in the past because
> the compiler
> > > optimised away stuff that didn't have any effect. Since
> every memset
> > > after the first is a no-op, you want to make sure it's
> still actually
> > > doing the work...
> >
> > Well, on both linux and AIX, all 30 of the 64000000 iterations loops
> > from the source exist (10 int, 10 long, 10 memset). According to my
> > understanding of the assembler, memset itself is only
> called for values
> > >= 64 bytes on both platforms and the memset is called in
> each iteration.
> >
> > The assembler for the 64 byte loops, with prepended line
> number, first
> > loop MemSetLoop int-variant, second loop memset, third loop
> MemSetLoop
> > long-variant:
> >
> > 64-bit AIX:
> >
> > 419 addi 3,1,112
> > 420 li 4,0
> > 421 bl .gettimeofday
> > 422 nop
> > 423 lis 10,0x3d0
> > 424 cmpld 6,26,16
> > 425 li 11,0
> > 426 ori 10,10,36864
> > 427 L..41:
> > 428 bge 6,L..42
> > 429 mr 9,26
> > 430 li 0,0
> > 431 L..44:
> > 432 stw 0,0(9)
> > 433 addi 9,9,4
> > 434 cmpld 7,16,9
> > 435 bgt 7,L..44
> > 436 L..42:
> > 437 addi 0,11,1
> > 438 extsw 11,0
> > 439 cmpw 7,11,10
> > 440 bne+ 7,L..41
> > 441 li 4,0
> > 442 mr 3,22
> > 443 lis 25,0x3d0
> > 444 li 28,0
> > 445 bl .gettimeofday
> > 446 nop
> > 447 li 4,64
> > 448 addi 5,1,112
> > 449 ld 3,LC..9(2)
> > 450 mr 6,22
> > 451 ori 25,25,36864
> > 452 bl .print_time
> > 453 addi 3,1,112
> > 454 li 4,0
> > 455 bl .gettimeofday
> > 456 nop
> > 457 L..46:
> > 458 mr 3,26
> > 459 li 4,0
> > 460 li 5,64
> > 461 bl .memset
> > 462 nop
> > 463 addi 0,28,1
> > 464 extsw 28,0
> > 465 cmpw 7,28,25
> > 466 bne+ 7,L..46
> > 467 li 4,0
> > 468 mr 3,22
> > 469 bl .gettimeofday
> > 470 nop
> > 471 li 4,64
> > 472 addi 5,1,112
> > 473 ld 3,LC..11(2)
> > 474 mr 6,22
> > 475 bl .print_time
> > 476 addi 3,1,112
> > 477 li 4,0
> > 478 bl .gettimeofday
> > 479 nop
> > 480 lis 10,0x3d0
> > 481 cmpld 6,26,16
> > 482 li 11,0
> > 483 ori 10,10,36864
> > 484 L..48:
> > 485 bge 6,L..49
> > 486 mr 9,26
> > 487 li 0,0
> > 488 L..51:
> > 489 std 0,0(9)
> > 490 addi 9,9,8
> > 491 cmpld 7,9,16
> > 492 blt 7,L..51
> > 493 L..49:
> > 494 addi 0,11,1
> > 495 extsw 11,0
> > 496 cmpw 7,11,10
> > 497 bne+ 7,L..48
> > 498 li 4,0
> > 499 mr 3,22
> > 500 bl .gettimeofday
> > 501 nop
> > 502 li 4,64
> > 503 addi 5,1,112
> > 504 ld 3,LC..13(2)
> > 505 mr 6,22
> > 506 bl .print_time
> >
> >
> > 32-bit Linux:
> >
> > 387 popl %ecx
> > 388 popl %edi
> > 389 pushl $0
> > 390 leal -20(%ebp), %edx
> > 391 pushl %edx
> > 392 call gettimeofday
> > 393 xorl %edx, %edx
> > 394 addl $16, %esp
> > 395 .L41:
> > 396 movl -4160(%ebp), %eax
> > 397 cmpl %eax, -4144(%ebp)
> > 398 jae .L42
> > 399 movl -4144(%ebp), %eax
> > 400 .L44:
> > 401 movl $0, (%eax)
> > 402 addl $4, %eax
> > 403 cmpl %eax, -4160(%ebp)
> > 404 ja .L44
> > 405 .L42:
> > 406 incl %edx
> > 407 cmpl $64000000, %edx
> > 408 jne .L41
> > 409 subl $8, %esp
> > 410 pushl $0
> > 411 leal -28(%ebp), %edx
> > 412 pushl %edx
> > 413 call gettimeofday
> > 414 leal -28(%ebp), %eax
> > 415 movl %eax, (%esp)
> > 416 leal -20(%ebp), %ecx
> > 417 movl $64, %edx
> > 418 movl $.LC5, %eax
> > 419 call print_time
> > 420 popl %eax
> > 421 popl %edx
> > 422 pushl $0
> > 423 leal -20(%ebp), %edx
> > 424 pushl %edx
> > 425 call gettimeofday
> > 426 xorl %edi, %edi
> > 427 addl $16, %esp
> > 428 .L46:
> > 429 pushl %eax
> > 430 pushl $64
> > 431 pushl $0
> > 432 movl -4144(%ebp), %ecx
> > 433 pushl %ecx
> > 434 call memset
> > 435 incl %edi
> > 436 addl $16, %esp
> > 437 cmpl $64000000, %edi
> > 438 jne .L46
> > 439 subl $8, %esp
> > 440 pushl $0
> > 441 leal -28(%ebp), %eax
> > 442 pushl %eax
> > 443 call gettimeofday
> > 444 leal -28(%ebp), %edx
> > 445 movl %edx, (%esp)
> > 446 leal -20(%ebp), %ecx
> > 447 movl $64, %edx
> > 448 movl $.LC6, %eax
> > 449 call print_time
> > 450 popl %eax
> > 451 popl %edx
> > 452 pushl $0
> > 453 leal -20(%ebp), %eax
> > 454 pushl %eax
> > 455 call gettimeofday
> > 456 xorl %edx, %edx
> > 457 addl $16, %esp
> > 458 .L48:
> > 459 movl -4160(%ebp), %eax
> > 460 cmpl %eax, -4144(%ebp)
> > 461 jae .L49
> > 462 movl -4144(%ebp), %eax
> > 463 .L51:
> > 464 movl $0, (%eax)
> > 465 addl $4, %eax
> > 466 cmpl -4160(%ebp), %eax
> > 467 jb .L51
> > 468 .L49:
> > 469 incl %edx
> > 470 cmpl $64000000, %edx
> > 471 jne .L48
> > 472 subl $8, %esp
> > 473 pushl $0
> > 474 leal -28(%ebp), %edx
> > 475 pushl %edx
> > 476 call gettimeofday
> > 477 leal -28(%ebp), %eax
> > 478 movl %eax, (%esp)
> > 479 leal -20(%ebp), %ecx
> > 480 movl $64, %edx
> > 481 movl $.LC7, %eax
> > 482 call print_time
> >
> > --
> > Seneca Cunningham
> > scunning(at)ca(dot)afilias(dot)info
> >
> > ---------------------------(end of
> broadcast)---------------------------
> > TIP 5: don't forget to increase your free space map settings
> >
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square,
> Pennsylvania 19073
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2006-02-02 17:47:43 Re: TODO-Item: B-tree fillfactor control
Previous Message Stephan Szabo 2006-02-02 17:40:29 Re: Multiple logical databases

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2006-02-02 17:47:43 Re: TODO-Item: B-tree fillfactor control
Previous Message Stephan Szabo 2006-02-02 17:40:29 Re: Multiple logical databases