Re: Inlining comparators as a performance optimisation

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Peter Geoghegan <peter(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Inlining comparators as a performance optimisation
Date: 2011-11-18 13:39:38
Message-ID: CA+TgmoYusPUmR2yNjrryHStAu7oeYVpWxV8NmPm_tdT-_jJPSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 18, 2011 at 3:53 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> We have no proof that we need to do this for 10 or 100 data types. We
> only currently have proof that there is gain for the most common
> types.

Well, that's kind of my point. I think this needs more work before we
decide what the best approach is. So far, the ONLY test result we
have that compares inlining to not inlining shows a speedup from 60 ms
to 52 ms. I think that an 8 ms speedup on one test with one datatype
on one platform/compiler combination isn't sufficient evidence to
conclude that this is the best possible approach.

I think the way to look at this is that this patch contains two
possibly good ideas: one of which is to make the qsort() argument
match the qsort() calling convention, and the other of which is to
have multiple copies of the qsort() logic that inline the comparators
for their respective datatypes. Tom hypothesized that most of the
benefit was in the first idea, and the numbers that Peter posted seem
to support that conclusion. The first idea is also less invasive and
more broadly applicable, so to me that seems like the first thing to
pursue.

Now, that doesn't mean that we shouldn't consider the second idea as
well, but I don't believe that the evidence presented thus far is
sufficient to prove that we should go that route. It seems entirely
possible that inlining any non-trivial comparator function could work
out to a loss, or that the exact choice of compiler flags could affect
whether inlining works out to a plus or a minus (what happens with -O3
vs. -O2? what about -O1? what about -O2 -fno-omit-frame-pointer?
what if they're using HP's aCC or Intel's icc rather than gcc?).
There's no point in installing an optimization that could easily be a
pessimization on some other workload, and that hasn't been tested, or
at least no results have been posted here. On the other hand,
matching the calling convention of the comparator function to what
qsort() wants and eliminating the trampoline seems absolutely certain
to be a win in every case, and based on the work Peter has done it
seems like it might be a quite large win.

In fact, you have to ask yourself just exactly how much our
function-calling convention is costing us in general. We use that
mechanism an awful lot and whatever loss of cycles is involved would
be spread all over the code base where oprofile or gprof won't easily
be able to pick it out. Even the cost at any particular call site
will be split between the caller and the callee. There might be more
stuff we could optimize here than just sorting...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2011-11-18 13:51:16 Re: VACUUM touching file but not updating relation
Previous Message Alvaro Herrera 2011-11-18 13:38:01 Re: pgsql: Do missed autoheader run for previous commit.