Re: Inlining comparators as a performance optimisation

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Inlining comparators as a performance optimisation
Date: 2011-09-20 23:53:16
Message-ID: CAEYLb_Us1cSxn94G2CyhSp+r1TxrghV6sj70htAtwL1K5cWLHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20 September 2011 03:51, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Considering that -O2 is our standard optimization level, that
> observation seems to translate to "this patch will be useless in
> practice".  I think you had better investigate that aspect in some
> detail before spending more effort.

I don't think that the fact that that happens is at all significant at
this early stage, and it never even occurred to me that you'd think
that it might be. I was simply disclosing a quirk of this POC patch.
The workaround is probably to use a macro instead. For the benefit of
those that didn't follow the other threads, the macro-based qsort
implementation, which I found to perform significantly better than
regular qsort(), runs like this on my laptop when I built at 02 with
GCC 4.6 just now:

C stdlib quick-sort time elapsed: 2.092451 seconds
Inline quick-sort time elapsed: 1.587651 seconds

Does *that* look attractive to you? I've attached source code of the
program that produced these figures, which has been ported to C from
C++.

When I #define LARGE_SIZE 100000000, here's what I see:

[peter(at)peter inline_compar_test]$ ./a.out
C stdlib quick-sort time elapsed: 23.659411 seconds
Inline quick-sort time elapsed: 18.470611 seconds

Here, sorting with the function pointer/stdlib version takes about
1.28 times as long. In the prior test (with the smaller LARGE_SIZE),
it took about 1.32 times as long. Fairly predictable, linear, and not
to be sniffed at.

The variance I'm seeing across runs is low - a couple of hundredths of
a second at most. This is a Fedora 15 " Intel(R) Core(TM) i5-2540M CPU
@ 2.60GHz" machine. I'm not sure right now why the inline quick-sort
is less of a win than on my old Fedora 14 desktop (where it was 3.24
Vs 2.01), but it's still a significant win. Perhaps others can build
this simple program and tell me what they come up with.

>> This performance patch differs from most in that it's difficult in
>> principle to imagine a performance regression occurring.
>
> Really?  N copies of the same code could lead to performance loss just
> due to code bloat (ie, less of a query's inner loops fitting in CPU
> cache).

I did consider that. Of course inlining has an overhead, and I'll be
testing that each instance of inlining has a net benefit. I just meant
that many other performance patches have an obvious worst case, and I
think that it is policy to focus on that case, but I can't think of
one here. Does anyone else have any ideas?

> Not to mention the clear regression in maintainability.  So
> I'm disinclined to consider this sort of change without a significantly
> bigger win than you're suggesting above

Sure, there'll be some sort of regression in maintainability - I think
that HOT had a clear regression in maintainability too. The important
questions are obviously "how big is the loss of maintainability?", and
"is it worth it?". We'll know more when this work is actually shaped
into a proper patch. Perhaps I should have waited until I had
something along those lines before making an announcement, but I
wanted community input as early as possible. I think that there's
plenty of tweaking that can be done to get additional performance
improvements - all I've done so far is demonstrate that those
improvements are real and worth thinking about, in the fastest
possible way, partly because you expressed skepticism of the benefits
of inlining comparators to Greg Stark in an earlier thread.

Performance and maintainability are often somewhat in tension, but we
cannot ignore performance. If this work can bring us an improvement in
performance approaching the isolated macro Vs qsort() function pointer
benchmark, that's a *big* win. Sorting integers and floats is very
common and important.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachment Content-Type Size
inline_compar_test.tar.gz application/x-gzip 4.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-09-21 00:19:52 Re: EXPLAIN and nfiltered, take two
Previous Message Tom Lane 2011-09-20 22:06:55 Re: PostgreSQL X/Open Socket / BSD Socket Issue on HP-UX