Re: Statistical aggregate functions are not working with PARTIAL aggregation

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: david(dot)rowley(at)2ndquadrant(dot)com
Cc: andres(at)anarazel(dot)de, rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Statistical aggregate functions are not working with PARTIAL aggregation
Date: 2019-05-20 07:59:05
Message-ID: 20190520.165905.261954393.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

I couldn't understand the multiple argument lists with confident
so the patch was born from a guess^^; Sorry for the confusing but
I'm relieved by knowing that it was not so easy to understand.

At Mon, 20 May 2019 17:27:10 +1200, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote in <CAKJS1f_1xfn8navZP05U8BszsG=+CNck_99f_+0j2ccBSrBDkQ(at)mail(dot)gmail(dot)com>
> On Mon, 20 May 2019 at 13:20, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > How
> > about we have something roughly like:
> >
> > int numTransFnArgs = -1;
> > int numCombineFnArgs = -1;
> > Oid transFnInputTypes[FUNC_MAX_ARGS];
> > Oid combineFnInputTypes[2];
> >
> > if (DO_AGGSPLIT_COMBINE(...)
> > numCombineFnArgs = 1;
> > combineFnInputTypes = list_make2(aggtranstype, aggtranstype);
> > else
> > numTransFnArgs = get_aggregate_argtypes(aggref, transFnInputTypes);
> >
> > ...
> >
> > if (DO_AGGSPLIT_COMBINE(...))
> > build_pertrans_for_aggref(pertrans, aggstate, estate,
> > aggref, combinefn_oid, aggtranstype,
> > serialfn_oid, deserialfn_oid,
> > initValue, initValueIsNull,
> > combineFnInputTypes, numCombineFnArgs);
> > else
> > build_pertrans_for_aggref(pertrans, aggstate, estate,
> > aggref, transfn_oid, aggtranstype,
> > serialfn_oid, deserialfn_oid,
> > initValue, initValueIsNull,
> > transFnInputTypes, numTransFnArgs);
> >
> > seems like that'd make the code clearer?
>
> I think that might be a good idea... I mean apart from trying to
> assign a List to an array :) We still must call
> get_aggregate_argtypes() in order to determine the final function, so
> the code can't look exactly like you've written.
>
> > I wonder if we shouldn't
> > strive to have *no* DO_AGGSPLIT_COMBINE specific logic in
> > build_pertrans_for_aggref (except perhaps for an error check or two).
>
> Just so we have a hard copy to review and discuss, I think this would
> look something like the attached.

May I give some comments? They might make me look stupid but I
can't help asking.

- numArguments = get_aggregate_argtypes(aggref, inputTypes);
+ numTransFnArgs = get_aggregate_argtypes(aggref, transFnInputTypes);

If the function retrieves argument types of transform functions,
it would be better that the function name is
get_aggregate_transargtypes() and Aggref.aggargtypes has the name
like aggtransargtypes.

/* Detect how many arguments to pass to the finalfn */
if (aggform->aggfinalextra)
- peragg->numFinalArgs = numArguments + 1;
+ peragg->numFinalArgs = numTransFnArgs + 1;
else
peragg->numFinalArgs = numDirectArgs + 1;

I can understand the aggfinalextra case, but cannot understand
another. As Andres said I think the code requires an explanation
of why the final args is not numTransFnArgs but *numDirectArgs
plus 1*.

+ /*
+ * When combining there's only one input, the to-be-combined
+ * added transition value from below (this node's transition
+ * value is counted separately).
+ */
+ pertrans->numTransInputs = 1;

I believe this works but why the member has not been set
correctly by the creator of the aggsplit?

+ /* Detect how many arguments to pass to the transfn */

I want to have a comment there that explains why what ordered-set
requires is not numTransFnArgs + (#sort cols?), but
"list_length(aggref->args)", or a comment that explanas why they
are compatible to be expplained.

> We do miss out on a few very small optimisations, but I don't think
> they'll be anything we could measure. Namely
> build_aggregate_combinefn_expr() called make_agg_arg() once and used
> it twice instead of calling it once for each arg. I don't think
> that's anything we could measure, especially in a situation where
> two-stage aggregation is being used.
>
> I ended up also renaming aggtransfn to transfn_oid in
> build_pertrans_for_aggref(). Having it called aggtranfn seems a bit
> too close to the pg_aggregate.aggtransfn column which is confusion
> given that we might pass it the value of the aggcombinefn column.

Is Form_pg_aggregate->aggtransfn different thing from
transfn_oid? It seems very confusing to me apart from the naming.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jason Harvey 2019-05-20 10:10:17 pg_upgrade can result in early wraparound on databases with high transaction load
Previous Message Dean Rasheed 2019-05-20 07:33:49 Re: Multivariate MCV stats can leak data to unprivileged users