Re: Changing SQL Inlining Behaviour (or...?)

From: Paul Ramsey <pramsey(at)cleverelephant(dot)ca>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Changing SQL Inlining Behaviour (or...?)
Date: 2018-11-09 19:14:38
Message-ID: CACowWR0PtJr2qJo5EUQ=goKA6pcGBq+4dBSDQBGdv_bUaC4XGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Neglected to include the footnotes...

As I promised at PgConf.eu, here's a follow-up email about PostGIS
parallelism and function inlining (an odd combination, it is true).

So, context:

- We want PostGIS to parallelize more. In order to achieve that we need to
mark our functions with more realistic COSTs. Much much higher COSTs.
- When we do that, we hit a different problem. Our most commonly used
functions, ST_Intersects(), ST_DWithin() are actually SQL wrapper functions
are more complex combinations of index operators and exact computational
geometry functions.
- In the presence of high cost parameters that are used multiple times in
SQL functions, PostgreSQL will stop inlining those functions, in an attempt
to save the costs of double-calculating the parameters.
- For us, that's the wrong choice, because we lose the index operators at
the same time as we "save" the cost of double calculation.
- We need our wrapper functions inlined, even when they are carrying a high
COST.

At pgconf.eu, I canvassed this problem and some potential solutions:

* Solution #1 - Quick and dirty and visible: Add an 'INLINE' function
decorator, which tells PostgreSQL to just ignore costs and inline the
function regardless. Pros: it's not too hard to implement and I'm happy to
contribute this. Cons: it adds very specific single-purpose syntax to
CREATE FUNCTION.

* Solution #2 - Quick and dirty and invisible. Tom suggested a hack that
achieves the aims of #1 but without adding syntax to CREATE FUNCTION: have
the inlining logic look at the cost of the wrapper and the cost of
parameters, and if the cost of the wrapper "greatly exceeded" the cost of
the parameters, then inline. So the PostGIS project would just set the cost
of our wrappers very high, and we'd get the behaviour we want, while other
users who want to use wrappers to force caching of calculations would have
zero coded wrapper functions. Pros: Solves the problem and easy to
implement, I'm happy to contribute. Cons: it's so clearly a hack involving
hidden (from users) magic.

* Solution #3 - Correct and globally helpful. When first presented with
this problem last year, both Andres and Tom said [1] "but the right fix is
to avoid the double-calculation of identical entries in the target list"
because then it would be safe to inline functions with duplicate expensive
parameters. This would not only address the proximate PostGIS problem but
make a whole class of queries faster. There was some discussion of this
approach last week [2]. Pros: The right thing! Improves a whole pile of
other performance cases. Cons: Hard! Only experienced PgSQL developers need
apply.

Naturally, I would love to see #3 implemented, but there's only so much
experienced developer time to go around, and it's beyond my current skill
set. I would like to be able to start to improve PostGIS parallelism with
PgSQL 12, so in order to make that not impossible, I'd like to implement
either #1 or #2 in case #3 doesn't happen for PgSQL 12.

So my question to hackers is: which is less worse, #1 or #2, to implement
and submit to commitfest, in case #3 does not materialize in time for PgSQL
12?

Thanks!

Paul

[1]
https://www.postgresql.org/message-id/20171116182208.kcvf75nfaldv36uh%40alap3.anarazel.de
[2] https://www.postgresql.org/message-id/10355.1540926295%40sss.pgh.pa.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2018-11-09 19:37:10 Re: repeated procedure call error
Previous Message Pavel Stehule 2018-11-09 19:07:58 Re: repeated procedure call error