Re: function calls optimization

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org,Andrzej Barszcz <abusinf(at)gmail(dot)com>
Subject: Re: function calls optimization
Date: 2019-10-31 15:20:39
Message-ID: 44DED8BB-135C-4A08-A4D0-870D22E2F412@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On October 31, 2019 8:06:50 AM PDT, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>Andres Freund <andres(at)anarazel(dot)de> writes:
>> On October 31, 2019 7:45:26 AM PDT, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
>wrote:
>>> We've typically supposed that the cost of searching for duplicate
>>> subexpressions would outweigh the benefits of sometimes finding
>them.
>
>> Based on profiles I've seen I'm not sure that's the right choice.
>Both for when the calls are expensive (say postgis stuff), and for when
>a lot of rows are processed.
>
>Yeah, if your mental model of a function call is some remarkably
>expensive
>PostGIS geometry manipulation, it's easy to justify doing a lot of work
>to try to detect duplicates. But most functions in most queries are
>more like int4pl or btint8cmp, and it's going to be extremely
>remarkable
>if you can make back the planner costs of checking for duplicate usages
>of those.

Well, if it's an expression containing those individuals cheap calls on a seqscan on a large table below an aggregate, it'd likely still be a win. But we don't, to my knowledge, really have a good way to model optimizations like this that should only be done if either expensive or have a high loop count.

I guess one ugly way to deal with this would be to eliminate redundancies very late, e.g. during setrefs (where a better data structure for matching expressions would be good anyway), as we already know all the costs.

But ideally we would want to do be able to take such savings into account earlier, when considering different paths. I suspect that it might be a good enough vehicle to tackle the rest of the work however, at least initially.

We could also "just" do such elimination during expression "compilation", but it'd be better to not have to do something as complicated as this for every execution of a prepared statement.

>> I think one part of doing this in a realistic manner is an efficient
>> search for redundant expressions. The other, also non trivial, is how
>to
>> even represent re eferences to the results of expressions in other
>parts of the expression tree / other expressions.
>
>Yup, both of those would be critical to do right.

Potentially related note: for nodes like seqscan, combining the qual and projection processing into one expression seems to be a noticable win (at least when taking care do emit two different sets of deform expression steps). Wonder if something like that would take care of avoiding the need for cross expression value passing in enough places.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ibrar Ahmed 2019-10-31 15:34:01 Re: Resume vacuum and autovacuum from interruption and cancellation
Previous Message Tom Lane 2019-10-31 15:06:50 Re: function calls optimization