Skip site navigation (1) Skip section navigation (2)

Re: Inlining comparators as a performance optimisation

From: "Pierre C" <lists(at)peufeu(dot)com>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Peter Geoghegan" <peter(at)2ndquadrant(dot)com>, "PG Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Inlining comparators as a performance optimisation
Date: 2012-01-13 09:48:56
Message-ID: op.v70n7uhgeorkce@apollo13 (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
On Wed, 21 Sep 2011 18:13:07 +0200, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> On 21.09.2011 18:46, Tom Lane wrote:
>>> The idea that I was toying with was to allow the regular SQL-callable
>>> comparison function to somehow return a function pointer to the
>>> alternate comparison function,
>> You could have a new function with a pg_proc entry, that just returns a
>> function pointer to the qsort-callback.
> Yeah, possibly.  That would be a much more invasive change, but cleaner
> in some sense.  I'm not really prepared to do all the legwork involved
> in that just to get to a performance-testable patch though.

A few years ago I had looked for a way to speed up COPY operations, and it  
turned out that COPY TO has a good optimization opportunity. At that time,  
for each datum, COPY TO would :

- test for nullness
- call an outfunc through fmgr
- outfunc pallocs() a bytea or text, fills it with data, and returns it  
(sometimes it uses an extensible string buffer which may be repalloc()d  
several times)
- COPY memcpy()s returned data to a buffer and eventually flushes the  
buffer to client socket.

I introduced a special write buffer with an on-flush callback (ie, a close  
relative of the existing string-buffer), in this case the callback was  
"flush to client socket", and several outfuncs (one per type) which took  
that buffer as argument, besides the datum to output, and simply put the  
datum inside the buffer, with appropriate transformations (like converting  
to bytea or text), and flushed if needed.

Then the COPY TO BINARY of a constant-size datum would turn to :
- one test for nullness
- one C function call
- one test to ensure appropriate space available in buffer (flush if  
- one htonl() and memcpy of constant size, which the compiler turns out  
into a couple of simple instructions

I recall measuring speedups of 2x - 8x on COPY BINARY, less for text, but  
still large gains.

Although eliminating fmgr call and palloc overhead was an important part  
of it, another large part was getting rid of memcpy()'s which the compiler  
turned into simple movs for known-size types, a transformation that can be  
done only if the buffer write functions are inlined inside the outfuncs.  
Compilers love constants...

Additionnally, code size growth was minimal since I moved the old outfuncs  
code into the new outfuncs, and replaced the old fmgr-callable outfuncs  
with "create buffer with on-full callback=extend_and_repalloc() - pass to  
new outfunc(buffer,datum) - return buffer". Which is basically equivalent  
to the previous palloc()-based code, maybe with a few extra instructions.

When I submitted the patch for review, Tom rightfully pointed out that my  
way of obtaining the C function pointer sucked very badly (I don't  
remember how I did it, only that it was butt-ugly) but the idea was to get  
a quick measurement of what could be gained, and the result was positive.  
Unfortunately I had no time available to finish it and make it into a real  
patch, I'm sorry about that.

So why do I post in this sorting topic ? It seems, by bypassing fmgr for  
functions which are small, simple, and called lots of times, there is a  
large gain to be made, not only because of fmgr overhead but also because  
of the opportunity for new compiler optimizations, palloc removal, etc.  
However, in my experiment the arguments and return types of the new  
functions were DIFFERENT from the old functions : the new ones do the same  
thing, but in a different manner. One manner was suited to sql-callable  
functions (ie, palloc and return a bytea) and another one to writing large  
amounts of data (direct buffer write). Since both have very different  
requirements, being fast at both is impossible for the same function.

Anyway, all that rant boils down to :

Some functions could benefit having two versions (while sharing almost all  
the code between them) :
- User-callable (fmgr) version (current one)
- C-callable version, usually with different parameters and return type

And it would be cool to have a way to grab a bare function pointer on the  
second one.

Maybe an extra column in pg_proc would do (but then, the proargtypes and  
friends would describe only the sql-callable version) ?
Or an extra table ? pg_cproc ?
Or an in-memory hash : hashtable[ fmgr-callable function ] => C version
- What happens if a C function has no SQL-callable equivalent ?
Or (ugly) introduce an extra per-type function type_get_function_ptr(  
function_kind ) which returns the requested function ptr

If one of those happens, I'll dust off my old copy-optimization patch ;)

Hmm... just my 2c


In response to


pgsql-hackers by date

Next:From: Fujii MasaoDate: 2012-01-13 10:04:49
Subject: replay_location indicates incorrect location
Previous:From: Simon RiggsDate: 2012-01-13 09:34:05
Subject: Re: pgbench post-connection command

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group