v13: Performance regression related to FORTIFY_SOURCE

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: v13: Performance regression related to FORTIFY_SOURCE
Date: 2020-04-19 22:07:22
Message-ID: 99b2eab335c1592c925d8143979c8e9e81e1575f.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


There was a previous thread[1], but I think it needs some wider
discussion.

I brought up an issue where GCC in combination with FORTIFY_SOURCE[2]
causes a perf regression for logical tapes after introducing
LogicalTapeSetExtend()[3]. Unfortunately, FORTIFY_SOURCE is used by
default on ubuntu. I have not observed the problem with clang.

There is no reason why the change should trigger the regression, but it
does. The slowdown is due to GCC switching to an inlined version of
memcpy() for LogicalTapeWrite() at logtape.c:768. The change[3] seems
to have little if anything to do with that.

GCC's Object Size Checking[4] doc says:

"There are built-in functions added for many common
string operation functions, e.g., for memcpy
__builtin___memcpy_chk built-in is provided. This
built-in has an additional last argument, which is
the number of bytes remaining in the object the dest
argument points to or (size_t) -1 if the size is not
known. The built-in functions are optimized into the
normal string functions like memcpy if the last
argument is (size_t) -1 or if it is known at compile
time that the destination object will not be
overflowed..."

In other words, if GCC knows the size of the object it tries to either
verify at compile time that it will never overflow, or it inserts a
runtime check. But if it doesn't know the size of the object, there's
nothing it can do so it just uses memcpy() like normal.

Knowing the destination buffer size at compile time would be impossible
(before or after my change) because palloc() doesn't have the
alloc_size attribute[5] specified. Even if it is specified (which I
tried), and if the compiler was smart enough (which it's not), it could
still only come up with a maximum size because the offset changes at
runtime. Regardless, I tried printing out the results of:
__builtin_object_size (lt->buffer + lt->pos, 0)
and the result is always -1 (unknown).

I have attached a workaround patch which restores the performance, and
it's isolatted to logtape.c, but it's ugly (and not a little bit).

The questions are:

1. Is my analysis correct?
2. What is the scale of this problem? What about other platforms or
compilers? Are there other cases in PostgreSQL that might suffer from
the use of FORTIFY_SOURCE?
3. Even if this is the compiler's fault, should we still fix it?
4. Does the attached fix have any dangers of regressing on other
compilers/platforms?
5. Does anyone have a suggestion for a better fix?

Regards,
Jeff Davis

[1]
https://postgr.es/m/91ca648cfd1f99bf07981487a7d81a1ec926caad.camel@j-davis.com
[2]
https://fedoraproject.org/wiki/Security_Features?rd=Security/Features#Compile_Time_Buffer_Checks_.28FORTIFY_SOURCE.29
[3]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=24d85952
[4] https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html
[5]
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005fsize-function-attribute

Attachment Content-Type Size
fix-regression.patch text/x-patch 5.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-04-19 22:55:46 Re: Adding missing object access hook invocations
Previous Message David Rowley 2020-04-19 22:00:28 Re: [PATCH] Small optimization across postgres (remove strlen duplicate usage)