Re: Deduplicate aggregates and transition functions in planner

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Deduplicate aggregates and transition functions in planner
Date: 2020-10-29 08:17:20
Message-ID: 16a1e564-1fa5-b4ea-c512-9ca133ea4bc0@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28/10/2020 21:59, Andres Freund wrote:
> On 2020-10-28 21:10:41 +0200, Heikki Linnakangas wrote:
>> Currently, ExecInitAgg() performs quite a lot of work, to deduplicate
>> identical Aggrefs, as well as Aggrefs that can share the same transition
>> state. That doesn't really belong in the executor, we should perform that
>> work in the planner. It doesn't change from one invocation of the plan to
>> another, and it would be nice to reflect the state-sharing in the plan
>> costs.
>
> Woo! Very glad to see this tackled.
>
> It wouldn't surprise me to see a small execution time speedup here -
> I've seen the load of the aggno show up in profiles.

I think you'd be hard-pressed to find a real-life query where it
matters. But if you don't care about real life:

regression=# do $$
begin
for i in 1..100000 loop
perform sum(g), sum(g+0), sum(g+1), sum(g+2), sum(g+3), sum(g+4),
sum(g+5), sum(g+6), sum(g+7), sum(g+8), sum(g+9), sum(g+10) from
generate_series(1,1) g;
end loop;
end;
$$;
DO
Time: 1282.701 ms (00:01.283)

vs.

Time: 860.323 ms

with the patch.

>> @@ -783,14 +783,13 @@ ExecInitExprRec(Expr *node, ExprState *state,
>>
>> scratch.opcode = EEOP_AGGREF;
>> scratch.d.aggref.astate = astate;
>> - astate->aggref = aggref;
>> + astate->aggno = aggref->aggno;
>>
>> if (state->parent && IsA(state->parent, AggState))
>> {
>> AggState *aggstate = (AggState *) state->parent;
>>
>> - aggstate->aggs = lappend(aggstate->aggs, astate);
>> - aggstate->numaggs++;
>> + aggstate->aggs = lappend(aggstate->aggs, aggref);
>
> Hm. Why is aggstate->aggs still built during expression initialization?
> Imo that's a pretty huge wart that also introduces more
> order-of-operation brittleness to executor startup.

The Agg node itself doesn't include any information about the aggregates
and transition functions. Because of that, ExecInitAgg needs a
"representive" Aggref for each transition state and agg, to initialize
the per-trans and per-agg structs. The expression initialization makes
those Aggrefs available for ExecInitAgg.

Instead of collecting all the Aggrefs in a list, ExecInitExprRec() could
set each representative Aggref directly in the right per-trans and
per-agg struct, based on the 'aggno' and 'aggtransno' fields. That
requires initializing the per-trans and per-agg arrays earlier, and for
that, we would need to store the # of aggs and transition states in the
Agg node, like you also suggested. Certainly doable, but on the whole,
it didn't really seem better to me. Attached is a patch, to demonstrate
what that looks like, on top of the main patch. It's not complete,
there's at least one case with hash-DISTINCT for queries like "SELECT
DISTINCT aggregate(x) ..." where the planner creates an Agg for the
DISTINCT without aggregates, but the code currently passes numAggs=1 to
the executor. Some further changes would be needed in the planner, to
mark the AggPath generated for deduplication differently from the
AggPaths created for aggregation. Again that's doable, but on the whole
I prefer the approach to scan the Aggrefs in executor startup, for now.

I'd like to get rid of the "representative Aggrefs" altogether, and pass
information about the transition and final functions from planner to
executor in some other form. But that's exactly what got me into the
refactoring that was ballooning out of hand that I mentioned.

- Heikki

Attachment Content-Type Size
pass-numaggs-from-planner.patch text/x-patch 12.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2020-10-29 08:50:44 Re: Parallel copy
Previous Message Masahiro Ikeda 2020-10-29 08:03:56 Re: Add statistics to pg_stat_wal view for wal related parameter tuning