Inlining functions with "expensive" parameters

From: Paul Ramsey <pramsey(at)cleverelephant(dot)ca>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Inlining functions with "expensive" parameters
Date: 2017-11-09 20:11:49
Message-ID: CACowWR2kuB_yApPhB=zUQ_rKqN5NpdAvNfQqYZ0PhRPBVCbz6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

All,

As we try and make PostGIS more "parallel sensitive" we have been added
costs to our functions, so that their relative CPU cost is more accurately
reflected in parallel plans.

This has resulted in an odd side effect: some of our "wrapper" functions
stop giving index scans in plans [1]. This is a problem!

An example of a "wrapper" is ST_Intersects(geom1, geom2). It combines an
index operation (geom1 && geom2) with an exact spatial test
(_ST_Intersects(geom1, geom2). This is primarily for user convenience, and
has worked for us well for a decade and more. Having this construct stop
working is definitely a problem.

As we add costs to our functions, the odds increase that one of the
parameters to a wrapper might be a costed function. It's not uncommon to
see:

ST_Interects(geom, ST_SetSRID('POLYGON(...)', 4326))

It's fair to say that we really do depend on our wrappers getting inlined
basically all the time. They are simple functions, they do nothing other
than 'SELECT func1() AND func2() AND arg1 && arg2'.

However, once costs are added to the parameters, the inlining can be turned
off relatively quickly. Here's a PgSQL native example:

-- Create data table and index. Analyze.
DROP TABLE IF EXISTS boxen;
CREATE TABLE boxen AS
SELECT row_number() OVER() As gid,
box(point(x, y),point(x+1, y+1)) AS b, x, y
FROM generate_series(-100,100) As y, generate_series(-100,100) As x;
CREATE INDEX idx_b_geom_gist ON boxen USING gist(b);
ANALYZE boxen;

-- An inlined function
-- When set 'STRICT' it breaks index access
-- However 'IMMUTABLE' doesn't seem to bother it
CREATE OR REPLACE FUNCTION good_box(box, box)
RETURNS boolean
AS 'SELECT $1 OPERATOR(&&) $2 AND length(lseg(point($1),point($2)))
< 3'
LANGUAGE 'sql';

-- Start with a low cost circle()
ALTER FUNCTION circle(point, double precision) COST 1;

-- [A] Query plan hits index
EXPLAIN SELECT gid
FROM boxen
WHERE good_box(
boxen.b,
box(circle(point(20.5, 20.5), 2))
);

-- [B] Query plan hits index
EXPLAIN SELECT gid
FROM boxen,
(SELECT x, y FROM boxen WHERE x < 0 and y < 0) AS c
WHERE good_box(
boxen.b,
box(circle(point(c.x, c.y), 2))
);

-- Increase cost of circle
ALTER FUNCTION circle(point, double precision) COST 100;

-- [B] Query plan does not hit index!
EXPLAIN SELECT gid
FROM boxen,
(SELECT x, y FROM boxen WHERE x < 0 and y < 0) AS c
WHERE good_box(
boxen.b,
box(circle(point(c.x, c.y), 2))
);

The inlining is getting tossed out on a test of how expensive the function
parameters are [2]. As a result, we lose what is really the correct plan,
and get a sequence scan instead of an index scan.

The test of parameter cost seems quite old (15+ years) and perhaps didn't
anticipate highly variable individual function costs (or maybe it did). As
it stands though, PostGIS is currently stuck choosing between having costs
on our functions or having our inlined wrappers, because we cannot have
both at the same time.

Any thoughts?

Thanks!

P.

[1] https://trac.osgeo.org/postgis/ticket/3675#comment:18
[2]
https://github.com/postgres/postgres/blob/ae20b23a9e7029f31ee902da08a464d968319f56/src/backend/optimizer/util/clauses.c#L4581-L4584

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2017-11-09 20:12:02 Re: proposal: psql command \graw
Previous Message Fabien COELHO 2017-11-09 20:03:59 Re: proposal: psql command \graw