Re: Some platform-specific MemSet research

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Rocco Altier <RoccoA(at)Routescape(dot)com>
Cc: Seneca Cunningham <scunning(at)ca(dot)afilias(dot)info>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Some platform-specific MemSet research
Date: 2006-02-02 17:48:37
Message-ID: 200602021748.k12Hmb629602@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Rocco Altier wrote:
> I wanted to chime in that I also see this speedup from using XLC 6.0
> (IBM's cc), even in 32bit mode. I have tested on AIX 5.2 and 5.1.
>
> I think this would be good to include in the regular release.
>
> Not sure how many people are running older versions of AIX that would
> want a new version of postgres.
>

OK, perfect. I will work on making it platform-specific and report
back.

---------------------------------------------------------------------------

> -rocco
>
>
>
> > -----Original Message-----
> > From: pgsql-hackers-owner(at)postgresql(dot)org
> > [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Bruce Momjian
> > Sent: Wednesday, February 01, 2006 12:11 PM
> > To: Seneca Cunningham
> > Cc: Martijn van Oosterhout; pgsql-hackers(at)postgresql(dot)org
> > Subject: Re: [HACKERS] Some platform-specific MemSet research
> >
> >
> >
> > My guess is that there is some really fast assembler for
> > memory copy on
> > AIX, and only libc memset() has it. If you want, we can make
> > MEMSET_LOOP_LIMIT in c.h a configure value, and allow template/aix to
> > set it to zero, causing memset() to be always used.
> >
> > Are you prepared to make this optimization decision for all AIX users
> > using gcc, or only for certain versions?
> >
> > --------------------------------------------------------------
> > -------------
> >
> > Seneca Cunningham wrote:
> > > Martijn van Oosterhout wrote:
> > > > On Tue, Jan 24, 2006 at 05:24:28PM -0500, Seneca Cunningham wrote:
> > > >
> > > >>After reading the post on -patches proposing that MemSet
> > be changed to
> > > >>use long instead of int32 on the grounds that a pair of
> > x86-64 linux
> > > >>boxes took less time to execute the long code 64*10^6
> > times[1], I took a
> > > >>look at how the testcode performed on AIX with gcc.
> > While the switch to
> > > >>long did result in a minor performance improvement, dropping the
> > > >>MemSetLoop in favour of the native memset resulted in the
> > tests taking
> > > >>~25% the time as the MemSetLoop-like int loop. The 32-bit
> > linux system I
> > > >>ran the expanded tests on showed that for the buffer size
> > range that
> > > >>postgres can use the looping MemSet instead of memset
> > (size <= 1024
> > > >>bytes), MemSet generally had better performance.
> > > >
> > > >
> > > > Could you please check the asm output to see what's going
> > on. We've had
> > > > tests like these produce odd results in the past because
> > the compiler
> > > > optimised away stuff that didn't have any effect. Since
> > every memset
> > > > after the first is a no-op, you want to make sure it's
> > still actually
> > > > doing the work...
> > >
> > > Well, on both linux and AIX, all 30 of the 64000000 iterations loops
> > > from the source exist (10 int, 10 long, 10 memset). According to my
> > > understanding of the assembler, memset itself is only
> > called for values
> > > >= 64 bytes on both platforms and the memset is called in
> > each iteration.
> > >
> > > The assembler for the 64 byte loops, with prepended line
> > number, first
> > > loop MemSetLoop int-variant, second loop memset, third loop
> > MemSetLoop
> > > long-variant:
> > >
> > > 64-bit AIX:
> > >
> > > 419 addi 3,1,112
> > > 420 li 4,0
> > > 421 bl .gettimeofday
> > > 422 nop
> > > 423 lis 10,0x3d0
> > > 424 cmpld 6,26,16
> > > 425 li 11,0
> > > 426 ori 10,10,36864
> > > 427 L..41:
> > > 428 bge 6,L..42
> > > 429 mr 9,26
> > > 430 li 0,0
> > > 431 L..44:
> > > 432 stw 0,0(9)
> > > 433 addi 9,9,4
> > > 434 cmpld 7,16,9
> > > 435 bgt 7,L..44
> > > 436 L..42:
> > > 437 addi 0,11,1
> > > 438 extsw 11,0
> > > 439 cmpw 7,11,10
> > > 440 bne+ 7,L..41
> > > 441 li 4,0
> > > 442 mr 3,22
> > > 443 lis 25,0x3d0
> > > 444 li 28,0
> > > 445 bl .gettimeofday
> > > 446 nop
> > > 447 li 4,64
> > > 448 addi 5,1,112
> > > 449 ld 3,LC..9(2)
> > > 450 mr 6,22
> > > 451 ori 25,25,36864
> > > 452 bl .print_time
> > > 453 addi 3,1,112
> > > 454 li 4,0
> > > 455 bl .gettimeofday
> > > 456 nop
> > > 457 L..46:
> > > 458 mr 3,26
> > > 459 li 4,0
> > > 460 li 5,64
> > > 461 bl .memset
> > > 462 nop
> > > 463 addi 0,28,1
> > > 464 extsw 28,0
> > > 465 cmpw 7,28,25
> > > 466 bne+ 7,L..46
> > > 467 li 4,0
> > > 468 mr 3,22
> > > 469 bl .gettimeofday
> > > 470 nop
> > > 471 li 4,64
> > > 472 addi 5,1,112
> > > 473 ld 3,LC..11(2)
> > > 474 mr 6,22
> > > 475 bl .print_time
> > > 476 addi 3,1,112
> > > 477 li 4,0
> > > 478 bl .gettimeofday
> > > 479 nop
> > > 480 lis 10,0x3d0
> > > 481 cmpld 6,26,16
> > > 482 li 11,0
> > > 483 ori 10,10,36864
> > > 484 L..48:
> > > 485 bge 6,L..49
> > > 486 mr 9,26
> > > 487 li 0,0
> > > 488 L..51:
> > > 489 std 0,0(9)
> > > 490 addi 9,9,8
> > > 491 cmpld 7,9,16
> > > 492 blt 7,L..51
> > > 493 L..49:
> > > 494 addi 0,11,1
> > > 495 extsw 11,0
> > > 496 cmpw 7,11,10
> > > 497 bne+ 7,L..48
> > > 498 li 4,0
> > > 499 mr 3,22
> > > 500 bl .gettimeofday
> > > 501 nop
> > > 502 li 4,64
> > > 503 addi 5,1,112
> > > 504 ld 3,LC..13(2)
> > > 505 mr 6,22
> > > 506 bl .print_time
> > >
> > >
> > > 32-bit Linux:
> > >
> > > 387 popl %ecx
> > > 388 popl %edi
> > > 389 pushl $0
> > > 390 leal -20(%ebp), %edx
> > > 391 pushl %edx
> > > 392 call gettimeofday
> > > 393 xorl %edx, %edx
> > > 394 addl $16, %esp
> > > 395 .L41:
> > > 396 movl -4160(%ebp), %eax
> > > 397 cmpl %eax, -4144(%ebp)
> > > 398 jae .L42
> > > 399 movl -4144(%ebp), %eax
> > > 400 .L44:
> > > 401 movl $0, (%eax)
> > > 402 addl $4, %eax
> > > 403 cmpl %eax, -4160(%ebp)
> > > 404 ja .L44
> > > 405 .L42:
> > > 406 incl %edx
> > > 407 cmpl $64000000, %edx
> > > 408 jne .L41
> > > 409 subl $8, %esp
> > > 410 pushl $0
> > > 411 leal -28(%ebp), %edx
> > > 412 pushl %edx
> > > 413 call gettimeofday
> > > 414 leal -28(%ebp), %eax
> > > 415 movl %eax, (%esp)
> > > 416 leal -20(%ebp), %ecx
> > > 417 movl $64, %edx
> > > 418 movl $.LC5, %eax
> > > 419 call print_time
> > > 420 popl %eax
> > > 421 popl %edx
> > > 422 pushl $0
> > > 423 leal -20(%ebp), %edx
> > > 424 pushl %edx
> > > 425 call gettimeofday
> > > 426 xorl %edi, %edi
> > > 427 addl $16, %esp
> > > 428 .L46:
> > > 429 pushl %eax
> > > 430 pushl $64
> > > 431 pushl $0
> > > 432 movl -4144(%ebp), %ecx
> > > 433 pushl %ecx
> > > 434 call memset
> > > 435 incl %edi
> > > 436 addl $16, %esp
> > > 437 cmpl $64000000, %edi
> > > 438 jne .L46
> > > 439 subl $8, %esp
> > > 440 pushl $0
> > > 441 leal -28(%ebp), %eax
> > > 442 pushl %eax
> > > 443 call gettimeofday
> > > 444 leal -28(%ebp), %edx
> > > 445 movl %edx, (%esp)
> > > 446 leal -20(%ebp), %ecx
> > > 447 movl $64, %edx
> > > 448 movl $.LC6, %eax
> > > 449 call print_time
> > > 450 popl %eax
> > > 451 popl %edx
> > > 452 pushl $0
> > > 453 leal -20(%ebp), %eax
> > > 454 pushl %eax
> > > 455 call gettimeofday
> > > 456 xorl %edx, %edx
> > > 457 addl $16, %esp
> > > 458 .L48:
> > > 459 movl -4160(%ebp), %eax
> > > 460 cmpl %eax, -4144(%ebp)
> > > 461 jae .L49
> > > 462 movl -4144(%ebp), %eax
> > > 463 .L51:
> > > 464 movl $0, (%eax)
> > > 465 addl $4, %eax
> > > 466 cmpl -4160(%ebp), %eax
> > > 467 jb .L51
> > > 468 .L49:
> > > 469 incl %edx
> > > 470 cmpl $64000000, %edx
> > > 471 jne .L48
> > > 472 subl $8, %esp
> > > 473 pushl $0
> > > 474 leal -28(%ebp), %edx
> > > 475 pushl %edx
> > > 476 call gettimeofday
> > > 477 leal -28(%ebp), %eax
> > > 478 movl %eax, (%esp)
> > > 479 leal -20(%ebp), %ecx
> > > 480 movl $64, %edx
> > > 481 movl $.LC7, %eax
> > > 482 call print_time
> > >
> > > --
> > > Seneca Cunningham
> > > scunning(at)ca(dot)afilias(dot)info
> > >
> > > ---------------------------(end of
> > broadcast)---------------------------
> > > TIP 5: don't forget to increase your free space map settings
> > >
> >
> > --
> > Bruce Momjian | http://candle.pha.pa.us
> > pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> > + If your life is a hard drive, | 13 Roberts Road
> > + Christ can be your backup. | Newtown Square,
> > Pennsylvania 19073
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 9: In versions below 8.0, the planner will ignore your desire to
> > choose an index scan if your joining column's datatypes do not
> > match
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Woodward 2006-02-02 19:05:03 Re: Multiple logical databases
Previous Message Bruce Momjian 2006-02-02 17:47:43 Re: TODO-Item: B-tree fillfactor control

Browse pgsql-patches by date

  From Date Subject
Next Message Andrew Dunstan 2006-02-02 18:36:41 Re: [BUGS] BUG #2171: Differences compiling plpgsql in
Previous Message Bruce Momjian 2006-02-02 17:47:43 Re: TODO-Item: B-tree fillfactor control