Re: Avoid stack frame setup in performance critical routines using tail calls

From: Andres Freund <andres(at)anarazel(dot)de>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: Avoid stack frame setup in performance critical routines using tail calls
Date: 2021-07-20 15:57:23
Message-ID: 20210720155723.dau4xqsnfq72uih5@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-07-20 19:37:46 +1200, David Rowley wrote:
> On Tue, 20 Jul 2021 at 19:04, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > * AllocateSetAlloc.txt
> > > * palloc.txt
> > > * percent.txt
> >
> > Huh, that's interesting. You have some control flow enforcement stuff turned on (the endbr64). And it looks like it has a non zero cost (or maybe it's just skid). Did you enable that intentionally? If not, what compiler/version/distro is it? I think at least on GCC that's -fcf-protection=...
>
> It's ubuntu 21.04 with gcc 10.3 (specifically gcc version 10.3.0
> (Ubuntu 10.3.0-1ubuntu1)
>
> I've attached the same results from compiling with clang 12
> (12.0.0-3ubuntu1~21.04.1)

It looks like the ubuntu folks have changed the default for CET to on.

andres(at)ubuntu2020:~$ echo 'int foo(void) { return 17;}' > test.c && gcc -O2 -c -o test.o test.c && objdump -S test.o

test.o: file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
0: f3 0f 1e fa endbr64
4: b8 11 00 00 00 mov $0x11,%eax
9: c3 retq
andres(at)ubuntu2020:~$ echo 'int foo(void) { return 17;}' > test.c && gcc -O2 -fcf-protection=none -c -o test.o test.c && objdump -S test.o

test.o: file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
0: b8 11 00 00 00 mov $0x11,%eax
5: c3 retq

Independent of this patch, it might be worth running a benchmark with
the default options, and one with -fcf-protection=none. None of my
machines support it...

$ cpuid -1|grep CET
CET_SS: CET shadow stack = false
CET_IBT: CET indirect branch tracking = false
XCR0 supported: CET_U state = false
XCR0 supported: CET_S state = false

Here it adds about 40kB of .text, but I can't measure the CET
overhead...

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-07-20 16:05:11 Re: Question about non-blocking mode in libpq
Previous Message Ronan Dunklau 2021-07-20 15:47:57 Re: Early Sort/Group resjunk column elimination.