From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz> |
Subject: | Re: Avoid stack frame setup in performance critical routines using tail calls |
Date: | 2021-07-20 15:57:23 |
Message-ID: | 20210720155723.dau4xqsnfq72uih5@alap3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2021-07-20 19:37:46 +1200, David Rowley wrote:
> On Tue, 20 Jul 2021 at 19:04, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > * AllocateSetAlloc.txt
> > > * palloc.txt
> > > * percent.txt
> >
> > Huh, that's interesting. You have some control flow enforcement stuff turned on (the endbr64). And it looks like it has a non zero cost (or maybe it's just skid). Did you enable that intentionally? If not, what compiler/version/distro is it? I think at least on GCC that's -fcf-protection=...
>
> It's ubuntu 21.04 with gcc 10.3 (specifically gcc version 10.3.0
> (Ubuntu 10.3.0-1ubuntu1)
>
> I've attached the same results from compiling with clang 12
> (12.0.0-3ubuntu1~21.04.1)
It looks like the ubuntu folks have changed the default for CET to on.
andres(at)ubuntu2020:~$ echo 'int foo(void) { return 17;}' > test.c && gcc -O2 -c -o test.o test.c && objdump -S test.o
test.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: f3 0f 1e fa endbr64
4: b8 11 00 00 00 mov $0x11,%eax
9: c3 retq
andres(at)ubuntu2020:~$ echo 'int foo(void) { return 17;}' > test.c && gcc -O2 -fcf-protection=none -c -o test.o test.c && objdump -S test.o
test.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: b8 11 00 00 00 mov $0x11,%eax
5: c3 retq
Independent of this patch, it might be worth running a benchmark with
the default options, and one with -fcf-protection=none. None of my
machines support it...
$ cpuid -1|grep CET
CET_SS: CET shadow stack = false
CET_IBT: CET indirect branch tracking = false
XCR0 supported: CET_U state = false
XCR0 supported: CET_S state = false
Here it adds about 40kB of .text, but I can't measure the CET
overhead...
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2021-07-20 16:05:11 | Re: Question about non-blocking mode in libpq |
Previous Message | Ronan Dunklau | 2021-07-20 15:47:57 | Re: Early Sort/Group resjunk column elimination. |