On 5 December 2011 13:23, Heikki Linnakangas
> I'm still not convinced the big gain is in inlining the comparison
> functions. Your patch contains a few orthogonal tricks, too:
> * A fastpath for the case of just one scankey. No reason we can't do that
> without inlining.
True, though Tom did seem to not like that idea very much.
> * inlining the swap-functions within qsort. Thaẗ́'s different from inlining
> the comparison functions, and could be done regardless of the data type. The
> qsort element size in tuplesort.c is always sizeof(SortTuple)
Sure. I think that Tom mostly objects to hard-coded intelligence about
which datatypes are used, rather than specialisations per se.
> And there's some additional specializations we can do, not implemented in
> your patch, that don't depend on inlining the comparison functions:
> * A fastpath for the case of no NULLs
> * separate comparetup functions for ascending and descending sorts (this
> allows tail call elimination of the call to type-specific comparison
> function in the ascending version)
> * call CHECK_FOR_INTERRUPTS() less frequently.
All of those had occurred to me, except the last - if you look at the
definition of that macro, it looks like more trouble than it's worth
to do less frequently. I didn't revise my patch with them though,
because the difference that they made, while clearly measurable,
seemed fairly small, or at least relatively so. I wasn't about to add
additional kludge for marginal benefits given the existing
controversy, though I have not dismissed the idea entirely - it could
be important on some other machine.
> Perhaps you can get even more gain by adding the no-NULLs, and asc/desc
> special cases to your inlining patch, too, but let's squeeze all the fat out
> of the non-inlined version first.
As I said, those things are simple enough to test, and were not found
to make a significant difference, at least to my exact test case +
> One nice aspect of many of these
> non-inlining optimizations is that they help with external sorts, too.
I'd expect the benefits to be quite a lot smaller there, but fair point.
Results from running the same test on your patch are attached. I think
that while you're right to suggest that the inlining of the comparator
proper, rather than any other function or other optimisation isn't
playing a dominant role in these optimisations, I think that you're
underestimating the role of locality of reference. To illustrate this,
I've also included results for when I simply move the comparator to
another translation unit, logtape.c . There's clearly a regression
from doing so (I ran the numbers twice, in a clinical environment).
The point is, there is a basically unknown overhead paid for not
inlining - maybe inlining is not worth it, but that's a hard call to
I didn't try to make the numbers look worse by putting the comparator
proper in some other translation unit, but it may be that I'd have
shown considerably worse numbers by doing so.
Why the strong aversion to what I've done? I accept that it's ugly,
but is it really worth worrying about that sort of regression in
maintainability for something that was basically untouched since 2006,
and will probably remain untouched after this work concludes, whatever
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
In response to
pgsql-hackers by date
|Next:||From: Marti Raudsepp||Date: 2011-12-05 17:16:24|
|Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3|
|Previous:||From: Kevin Grittner||Date: 2011-12-05 17:09:06|
|Subject: Re: [REVIEW] Patch for cursor calling with named parameters|