Re: Inlining comparators as a performance optimisation

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Inlining comparators as a performance optimisation
Date: 2011-12-02 13:33:30
Message-ID: CAEYLb_VO-mvaWdBvA8LXxxzU_bkLj+WioQxt0exm2s+1MKp19w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2 December 2011 01:13, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Regardless of that, I'm still of the opinion that this patch is
> fundamentally misdesigned.  The way it's set up, it is only possible to
> have a fast path for types that are hard-wired into tuplesort.c, and
> that flies in the face of twenty years' worth of effort to make Postgres
> an extensible system.

What the user doesn't know can't hurt them. I'm not of the opinion
that an internal optimisations flies in the face of anything - is it
really so hard for you to hold your nose for a fairly isolated patch
like this?

> I really don't care about performance
> measurements: this is not an acceptable design, period.

If ever there was a justification for doing something so distasteful,
I would think that a 60% decrease in the time taken to perform a raw
sort for POD types (plus common covariants) would be it.

> What I want to see is some mechanism that pushes this out to the
> individual datatypes, so that both core and plug-in datatypes have
> access to the benefits.  There are a number of ways that could be
> attacked, but the most obvious one is to invent additional, optional
> support function(s) for btree opclasses.
>
> I also still think that we should pursue the idea of a lower-overhead
> API for comparison functions.  It may be that it's worth the code bulk
> to have an inlined copy of qsort for a small number of high-usage
> datatypes, but I think there are going to be a lot for which we aren't
> going to want to pay that price, and yet we could get some benefit from
> cutting the call overhead.

I'm not opposed to that. It just seems fairly tangential to what I've
done here, as well as considerably less important to users,
particularly when you look at the numbers I've produced. It would be
nice to get a lesser gain for text and things like that too, but other
than that I don't see a huge demand.

> A lower-overhead API would also save cycles
> in usage of the comparison functions from btree itself (remember that?).

Yes, I remember that. Why not do something similar there, and get
similarly large benefits?

> I think in particular that the proposed approach to multiple sort keys
> is bogus and should be replaced by just using lower-overhead
> comparators.  The gain per added code byte in doing it as proposed
> has got to be too low to be reasonable.

Reasonable by what standard? We may well be better off with
lower-overhead comparators for the multiple scanKey case (though I
wouldn't like to bet on it) but we would most certainly not be better
off without discriminating against single scanKey and multiple scanKey
cases as I have.

Look at the spreadsheet results_server.ods . Compare the "not inlined"
and "optimized" sheets. It's clear from those numbers that a
combination of inlining and of simply not having to consider a case
that will never come up (multiple scanKeys), the inlining
specialisation far outperforms the non-inlining, multiple scanKey
specialization for the same data (I simply commented out inlining
specializations so that non-inlining specialisations would always be
called for int4 and friends, even if there was only a single scanKey).
That's where the really dramatic improvements are seen. It's well
worth having this additional benefit, because it's such a common case
and the benefit is so big, and while I will be quite happy to pursue a
plan to bring some of these benefits to all types such as the one you
describe, I do not want to jettison the additional benefits described
to do so. Isn't that reasonable?

We're talking about taking 6778.302ms off a 20468.0ms query, rather
than 1860.332ms. Not exactly a subtle difference. Assuming that those
figures are representative of the gains to be had by a generic
mechanism that does not inline/specialize across number of scanKeys,
are you sure that that's worthwhile?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2011-12-02 13:44:50 Re: Why so few built-in range types?
Previous Message Heikki Linnakangas 2011-12-02 13:01:06 Re: WIP: Join push-down for foreign tables