Quick Links

Re: overhead due to casting extra parameters with aggregates (over and over)

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: overhead due to casting extra parameters with aggregates (over and over)
Date:	2019-09-23 17:44:47
Message-ID:	20190923174447.lncaq7g5l244azia@development
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Sep 23, 2019 at 12:53:36PM -0400, Tom Lane wrote:
>Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>> I've been working on a custom aggregate, and I've ran into some fairly
>> annoying overhead due to casting direct parameters over and over. I'm
>> wondering if there's a way to eliminate this, somehow, without having to
>> do an explicit cast.
>
>> Imagine you have a simple aggregate:
>
>> CREATE AGGREGATE tdigest_percentile(double precision, int, double precision[])
>> (
>> ...
>> );
>
>> with two direct parameters (actually, I'm not sure that's the correct
>> term, becuse this is not an ordered-set aggregate and [1] only talks
>> about direct parameters in that context). Anyway, I'm talking about the
>> extra parameters, after the 'double precision' value to aggregate.
>
>But you're not telling the system that those are direct parameters,
>at least not if you mean that they can only legitimately have one value
>across the whole query. As-is, they're just more aggregated arguments
>so we have to evaluate them again at each row.
>

Understood.

>It's fairly messy that the SQL spec ties direct arguments to ordered-set
>aggregates; you'd think there'd be some value in treating those features
>as orthogonal. I'm not sure how we could wedge them into the syntax
>otherwise, though :-(. You could perhaps convert your aggregate to
>an ordered-set aggregate, but then you'd be paying for a sort that
>you don't need, IIUC.
>

Yeah, having to do the sort (and keep all the data) is exactly what the
tdigest is meant to eliminate, so making it an ordered-set aggregate is
exactly the thing I don't want to do. Also, it disables parallel query,
which is another reason not to do that.

>> After a while, I've realized that the issue is casting - the CTE
>> produces numeric[] array, and we do the cast to double precision[] on
>> every call to the state transition function (and we do ~10M of those).
>
>The only reason that the CTE reference is cheap is that we understand
>that it's stable so we don't have to recompute it each time; otherwise
>you'd be moaning about that more than the cast. As you say, the short
>term workaround is to do the casting inside the sub-select. I think the
>long term fix is to generically avoid re-computing stable subexpressions.
>There was a patch for that a year or so ago but the author never finished
>it, AFAIR.
>

Hmmm, yeah. I'll dig throgh the archives, although it's not a very high
priority - it's more a thing that surprised/bugged me while working on
the custom aggregate.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Re: overhead due to casting extra parameters with aggregates (over and over) at 2019-09-23 16:53:36 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alexander Korotkov	2019-09-23 18:06:55	Re: JSONPATH documentation
Previous Message	Tom Lane	2019-09-23 16:53:36	Re: overhead due to casting extra parameters with aggregates (over and over)