Re: overhead due to casting extra parameters with aggregates (over and over)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: overhead due to casting extra parameters with aggregates (over and over)
Date: 2019-09-23 17:44:47
Message-ID: 20190923174447.lncaq7g5l244azia@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 23, 2019 at 12:53:36PM -0400, Tom Lane wrote:
>Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>> I've been working on a custom aggregate, and I've ran into some fairly
>> annoying overhead due to casting direct parameters over and over. I'm
>> wondering if there's a way to eliminate this, somehow, without having to
>> do an explicit cast.
>
>> Imagine you have a simple aggregate:
>
>> CREATE AGGREGATE tdigest_percentile(double precision, int, double precision[])
>> (
>> ...
>> );
>
>> with two direct parameters (actually, I'm not sure that's the correct
>> term, becuse this is not an ordered-set aggregate and [1] only talks
>> about direct parameters in that context). Anyway, I'm talking about the
>> extra parameters, after the 'double precision' value to aggregate.
>
>But you're not telling the system that those are direct parameters,
>at least not if you mean that they can only legitimately have one value
>across the whole query. As-is, they're just more aggregated arguments
>so we have to evaluate them again at each row.
>

Understood.

>It's fairly messy that the SQL spec ties direct arguments to ordered-set
>aggregates; you'd think there'd be some value in treating those features
>as orthogonal. I'm not sure how we could wedge them into the syntax
>otherwise, though :-(. You could perhaps convert your aggregate to
>an ordered-set aggregate, but then you'd be paying for a sort that
>you don't need, IIUC.
>

Yeah, having to do the sort (and keep all the data) is exactly what the
tdigest is meant to eliminate, so making it an ordered-set aggregate is
exactly the thing I don't want to do. Also, it disables parallel query,
which is another reason not to do that.

>> After a while, I've realized that the issue is casting - the CTE
>> produces numeric[] array, and we do the cast to double precision[] on
>> every call to the state transition function (and we do ~10M of those).
>
>The only reason that the CTE reference is cheap is that we understand
>that it's stable so we don't have to recompute it each time; otherwise
>you'd be moaning about that more than the cast. As you say, the short
>term workaround is to do the casting inside the sub-select. I think the
>long term fix is to generically avoid re-computing stable subexpressions.
>There was a patch for that a year or so ago but the author never finished
>it, AFAIR.
>

Hmmm, yeah. I'll dig throgh the archives, although it's not a very high
priority - it's more a thing that surprised/bugged me while working on
the custom aggregate.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2019-09-23 18:06:55 Re: JSONPATH documentation
Previous Message Tom Lane 2019-09-23 16:53:36 Re: overhead due to casting extra parameters with aggregates (over and over)