Skip site navigation (1) Skip section navigation (2)

Re: Inlining comparators as a performance optimisation

From: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Geoghegan <peter(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Inlining comparators as a performance optimisation
Date: 2011-12-06 10:40:23
Message-ID: CAP-rdTb85gGutES_GYUy36XrCTxmBhjwPSh+K+pvkEbBPU_nbg@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
2011/12/5 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:

> What is bothering me is that this approach is going to cause substantial
> bloat of the executable code, and such bloat has got distributed costs,
> which we don't have any really good way to measure but for sure
> micro-benchmarks addressing only sort speed aren't going to reveal them.
> Cache lines filled with sort code take cycles to flush and replace with
> something else.
>
> I think it's possibly reasonable to have inlined copies of qsort for a
> small number of datatypes, but it seems much less reasonable to have
> multiple copies per datatype in order to obtain progressively tinier
> micro-benchmark speedups.  We need to figure out what the tradeoff
> against executable size really is, but so far it seems like you've been
> ignoring the fact that there is any such tradeoff at all.

[ Randomly spouting ideas here: ]

Might it not be a good idea to decide whether to use the inlined
copies vs. the normal version, based on how much data to sort? Surely
for a very large sort, the cache blow-out doesn't matter that much
relative to the amount of time that can be saved sorting?

Assuming that all types would have an inlined sort function, although
that would indeed result in a larger binary, most of that binary would
never touch the cache, because corresponding large sorts are never
performed. If they would sporadically occur (and assuming the points
at which inline sorting starts to get used are chosen wisely), the
possibly resulting cache blow-out would be a net win.

I am also assuming here that instruction cache lines are small enough
for case line aliasing not to become a problem; putting all sort
functions next to each other in the binary (so that they don't alias
with the rest of the backend code that might be used more often) might
alleviate that.

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?

In response to

pgsql-hackers by date

Next:From: Magnus HaganderDate: 2011-12-06 11:38:30
Subject: Re: IDLE in transaction introspection
Previous:From: Magnus HaganderDate: 2011-12-06 10:14:02
Subject: Re: xlog location arithmetic

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group