Re: Deduplicate aggregates and transition functions in planner

From: Andres Freund <andres(at)anarazel(dot)de>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Deduplicate aggregates and transition functions in planner
Date: 2020-10-28 19:59:01
Message-ID: 20201028195901.t55yiwmxe3quguot@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-10-28 21:10:41 +0200, Heikki Linnakangas wrote:
> Currently, ExecInitAgg() performs quite a lot of work, to deduplicate
> identical Aggrefs, as well as Aggrefs that can share the same transition
> state. That doesn't really belong in the executor, we should perform that
> work in the planner. It doesn't change from one invocation of the plan to
> another, and it would be nice to reflect the state-sharing in the plan
> costs.

Woo! Very glad to see this tackled.

It wouldn't surprise me to see a small execution time speedup here -
I've seen the load of the aggno show up in profiles.

> Attached is a patch to do that. It adds two new fields to Aggref, 'aggno'
> and 'aggtransno', to identify the unique aggregate and transition states.
> The duplicates are detected, and those filled in, early in the planning.
> Aside from those fields, the planner doesn't pass any other new information
> to to the executor, so the the executor still has to do syscache lookups to
> get the transition, combine etc. functions.

> I tried a bigger refactoring at first, to pass more information from the
> planner to the executor, but the patch grew really large before I got very
> far with it. So as the first step, I think we should apply the attached
> patch, and further refactoring can be done after that, if it seems
> worthwhile.

Working incrementally makes sense.

> @@ -783,14 +783,13 @@ ExecInitExprRec(Expr *node, ExprState *state,
>
> scratch.opcode = EEOP_AGGREF;
> scratch.d.aggref.astate = astate;
> - astate->aggref = aggref;
> + astate->aggno = aggref->aggno;
>
> if (state->parent && IsA(state->parent, AggState))
> {
> AggState *aggstate = (AggState *) state->parent;
>
> - aggstate->aggs = lappend(aggstate->aggs, astate);
> - aggstate->numaggs++;
> + aggstate->aggs = lappend(aggstate->aggs, aggref);

Hm. Why is aggstate->aggs still built during expression initialization?
Imo that's a pretty huge wart that also introduces more
order-of-operation brittleness to executor startup.

Since AggRef now knows its aggno, the state for EEOP_AGGREF should be
changed to be just an int, instead of a pointer to AggrefExprState..

> @@ -3432,8 +3426,18 @@ ExecInitAgg(Agg *node, EState *estate, int eflags)
> /*
> * We should now have found all Aggrefs in the targetlist and quals.
> */
> - numaggs = aggstate->numaggs;
> - Assert(numaggs == list_length(aggstate->aggs));
> + numaggrefs = list_length(aggstate->aggs);
> + max_aggno = -1;
> + max_transno = -1;
> + foreach(l, aggstate->aggs)
> + {
> + Aggref *aggref = (Aggref *) lfirst(l);
> +
> + max_aggno = Max(max_aggno, aggref->aggno);
> + max_transno = Max(max_transno, aggref->aggtransno);
> + }
> + numaggs = max_aggno + 1;
> + numtrans = max_transno + 1;

We must have previously determined this, why don't we stash it in struct
Agg?

> --- a/src/backend/jit/llvm/llvmjit_expr.c
> +++ b/src/backend/jit/llvm/llvmjit_expr.c
> @@ -1850,19 +1850,11 @@ llvm_compile_expr(ExprState *state)
> case EEOP_AGGREF:
> {
> AggrefExprState *aggref = op->d.aggref.astate;
> - LLVMValueRef v_aggnop;
> LLVMValueRef v_aggno;
> LLVMValueRef value,
> isnull;
>
> - /*
> - * At this point aggref->aggno is not yet set (it's set up
> - * in ExecInitAgg() after initializing the expression). So
> - * load it from memory each time round.
> - */
> - v_aggnop = l_ptr_const(&aggref->aggno,
> - l_ptr(LLVMInt32Type()));
> - v_aggno = LLVMBuildLoad(b, v_aggnop, "v_aggno");
> + v_aggno = l_int32_const(aggref->aggno);

Yay!

> +/*
> + * get_agg_clause_costs
> + * Recursively find the Aggref nodes in an expression tree, and
> + * accumulate cost information about them.

Think this comment is out of date now.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-10-28 20:23:59 Re: Internal key management system
Previous Message John Naylor 2020-10-28 19:58:33 Re: duplicate function oid symbols