Re: Inlining comparators as a performance optimisation

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Inlining comparators as a performance optimisation
Date: 2011-11-21 02:02:06
Message-ID: CAEYLb_XX-k5PCkmTvS5ibi=KC2ZW10=mjTK1N0dg4kKm5ZsZDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Part of my problem with not producing specialisations that I really
neglected to complain about until now is the inconsistency between
types, and the need to deal with that.

We benefit from assuming in our specialisation that we're only dealing
with POD types, simply not considering things that we would otherwise
have had to for the benefit of types that have a Datum representation
that is pass-by-reference, or have collations, or have multiple
scankeys. Leaving aside the question of straight compiler-optimisation
benefits for the moment, the remaining benefit of what I've done comes
not so much from avoiding the usual function call machinery per se, as
from doing so *as well as* cutting down on what currently happens in
comparetup_heap to handle every single compound and scalar type.
Compare the function comparetup_heap with my meta-function
TYPE##AppSort to see what I mean. The function comparetup_heap is the
comparator directly used by qsort_arg when sorting heap tuples, and
qsort_arg outsources to comparetup_heap some things that you might not
expect it to (very little has changed about qsort_arg since we lifted
it from NetBSD back in 2006). So while you might imagine that that
loop in comparetup_heap and things like its use of the heap_getattr
macros won't expand to that many instructions, you really don't want
them in the inner loop of a long operation like qsorting, paid for up
to O(n ^ 2) times. There's also the obvious implications for compiler
optimisations, particularly relating to effective usage of CPU cache.

I'm trying to get you what you asked for: A straight choice between
what Tom suggested and what I suggested, with perhaps some compromises
between the two positions. That's sort of tricky though, especially
considering the issues raised above.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2011-11-21 03:24:16 Re: Singleton range constructors versus functional coercion notation
Previous Message Andreas 'ads' Scherbaum 2011-11-20 22:54:45 FOSDEM 2012 - PostgreSQL Devroom: Call for Speakers