Re: Inlining functions with "expensive" parameters

From: Andres Freund <andres(at)anarazel(dot)de>
To: Paul Ramsey <pramsey(at)cleverelephant(dot)ca>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>,Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Inlining functions with "expensive" parameters
Date: 2017-11-16 18:22:08
Message-ID: 20171116182208.kcvf75nfaldv36uh@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi,

On 2017-11-16 09:37:29 -0800, Paul Ramsey wrote:
> On Thu, Nov 9, 2017 at 12:11 PM, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>
> wrote:
>
> > All,
> >
> > As we try and make PostGIS more "parallel sensitive" we have been added
> > costs to our functions, so that their relative CPU cost is more accurately
> > reflected in parallel plans.
> >
> > This has resulted in an odd side effect: some of our "wrapper" functions
> > stop giving index scans in plans [1]. This is a problem!
> >
> > An example of a "wrapper" is ST_Intersects(geom1, geom2). It combines an
> > index operation (geom1 && geom2) with an exact spatial test
> > (_ST_Intersects(geom1, geom2). This is primarily for user convenience, and
> > has worked for us well for a decade and more. Having this construct stop
> > working is definitely a problem.
> >
> > As we add costs to our functions, the odds increase that one of the
> > parameters to a wrapper might be a costed function. It's not uncommon to
> > see:
> >
> > ST_Interects(geom, ST_SetSRID('POLYGON(...)', 4326))
> >
> > It's fair to say that we really do depend on our wrappers getting inlined
> > basically all the time. They are simple functions, they do nothing other
> > than 'SELECT func1() AND func2() AND arg1 && arg2'.
> >
> > However, once costs are added to the parameters, the inlining can be
> > turned off relatively quickly. Here's a PgSQL native example:
> >
> > -- Create data table and index. Analyze.
> > DROP TABLE IF EXISTS boxen;
> > CREATE TABLE boxen AS
> > SELECT row_number() OVER() As gid,
> > box(point(x, y),point(x+1, y+1)) AS b, x, y
> > FROM generate_series(-100,100) As y, generate_series(-100,100) As
> > x;
> > CREATE INDEX idx_b_geom_gist ON boxen USING gist(b);
> > ANALYZE boxen;
> >
> > -- An inlined function
> > -- When set 'STRICT' it breaks index access
> > -- However 'IMMUTABLE' doesn't seem to bother it
> > CREATE OR REPLACE FUNCTION good_box(box, box)
> > RETURNS boolean
> > AS 'SELECT $1 OPERATOR(&&) $2 AND length(lseg(point($1),point($2)))
> > < 3'
> > LANGUAGE 'sql';
> >
> > -- Start with a low cost circle()
> > ALTER FUNCTION circle(point, double precision) COST 1;
> >
> > -- [A] Query plan hits index
> > EXPLAIN SELECT gid
> > FROM boxen
> > WHERE good_box(
> > boxen.b,
> > box(circle(point(20.5, 20.5), 2))
> > );
> >
> > -- [B] Query plan hits index
> > EXPLAIN SELECT gid
> > FROM boxen,
> > (SELECT x, y FROM boxen WHERE x < 0 and y < 0) AS c
> > WHERE good_box(
> > boxen.b,
> > box(circle(point(c.x, c.y), 2))
> > );
> >
> > -- Increase cost of circle
> > ALTER FUNCTION circle(point, double precision) COST 100;
> >
> > -- [B] Query plan does not hit index!
> > EXPLAIN SELECT gid
> > FROM boxen,
> > (SELECT x, y FROM boxen WHERE x < 0 and y < 0) AS c
> > WHERE good_box(
> > boxen.b,
> > box(circle(point(c.x, c.y), 2))
> > );
> >
> > The inlining is getting tossed out on a test of how expensive the function
> > parameters are [2]. As a result, we lose what is really the correct plan,
> > and get a sequence scan instead of an index scan.
> >
> > The test of parameter cost seems quite old (15+ years) and perhaps didn't
> > anticipate highly variable individual function costs (or maybe it did). As
> > it stands though, PostGIS is currently stuck choosing between having costs
> > on our functions or having our inlined wrappers, because we cannot have
> > both at the same time.
> >
>
> I personally find it hard to make the case for not inlining all the time,
> but that's probably a lack of imagination. The functions in question pass
> all the other tests of "inlinability" the function cost one seems arbitrary.

Well, it's not a question of cost of the function now? Imagine
SELECT inlineable(something());
if you have 10 references for the parameter inside inlineable(). Then
currently something() would be evaluated 10 times. Which'd quite
possibly be bad.

So I'm not convinced we can just throw out the cost checks and be done
with it.

But what I *am* wondering about, is why we're not handling the
parameters in a different way. Instead of replacing the all parameter
references with the parameter, it shouldn't be too hard to instead
replace them with a new PARAM_EXEC like Param.

Besides reducing the cost barrier for inlining, that should also quite
drastically expand the set of inlinable cases because volatile functions
etc aren't a problem anymore.

Presumably we'd want either to continue replacing single references to
parameters like we currently are. But maybe we could store enough
information and make that just part of expression simplification?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2017-11-16 18:26:59 bgwriter_lru_maxpages range in postgresql.conf
Previous Message Robert Haas 2017-11-16 17:57:49 Re: [HACKERS] Inconsistencies between pg_settings and postgresql.conf