Re: Deduplicate aggregates and transition functions in planner

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Deduplicate aggregates and transition functions in planner
Date: 2020-11-19 10:38:51
Message-ID: 97969583-7f1b-c115-638d-fd8197818650@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29/10/2020 19:48, Andres Freund wrote:
> On 2020-10-29 10:17:20 +0200, Heikki Linnakangas wrote:
>> On 28/10/2020 21:59, Andres Freund wrote:
>>> It wouldn't surprise me to see a small execution time speedup here -
>>> I've seen the load of the aggno show up in profiles.
>>
>> I think you'd be hard-pressed to find a real-life query where it
>> matters. But if you don't care about real life:
>
> I was actually thinking about a different angle - that the evaluation of
> an Aggref can be faster, because we need less indirection to find the
> aggno. As you have already implemented for the JITed code, but removing
> it for the expression code looks easy enough too. You'd need a lot of
> groups and presumably a fair number of Aggrefs to see it.
>
> Attached is a quick version of what I am thinking wrt AggrefExprState.

Ah, I see, makes sense.

>> The Agg node itself doesn't include any information about the aggregates and
>> transition functions. Because of that, ExecInitAgg needs a "representive"
>> Aggref for each transition state and agg, to initialize the per-trans and
>> per-agg structs. The expression initialization makes those Aggrefs available
>> for ExecInitAgg.
>
>> Instead of collecting all the Aggrefs in a list, ExecInitExprRec() could set
>> each representative Aggref directly in the right per-trans and per-agg
>> struct, based on the 'aggno' and 'aggtransno' fields.
>
> Hold on a second: To me the question is why is it the right design that
> the Agg node doesn't have the information about "aggregates and
> transition functions"? Agg e.g. already does directly contains the group
> keys...
>
> My concern wouldn't really be addressed if we replace the lappend() in
> ExecInitExprRec() with setting something "directly in the right
> per-trans...". I think it's structurally wrong to have to discover
> Aggrefs at execution time at all.
>
> Perhaps the easiest incremental step would be to have something like a
> CookedAggref { int aggno; } and then just store the Aggref nodes in
> Agg->aggs, with aggno referencing that...

I started hacking on that CookedAggref approach, but it wasn't as simple
as it seemed. I tried to replace the Aggrefs with CookedAggrefs in
setrefs.c, but when set_plan_references() replaces expressions with Vars
referring to the output of a subnode, it needs to be able to match an
Aggref at an upper node to a CookedAggref on the node below.
Furthermore, the deparsing code in ruleutils.c needs to be able to find
the original Aggrefs, in order to print them nicely. All of that is
solvable, I'm sure, but it's not trivial. And it's new code that mostly
builds on top of attached patch, so I think that can be done separately
later, it doesn't need to block this patch.

So barring objections, I'm going to push the attached updated patch that
includes the removal of AggrefExprState, and leave CookedAggrefs or
other further refactorings for the future.

- Heikki

Attachment Content-Type Size
v2-0001-Move-peragg-and-pertrans-duplicate-finding-to-pre.patch text/x-patch 87.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2020-11-19 10:40:19 Re: ResourceOwner refactoring
Previous Message Craig Ringer 2020-11-19 10:33:49 Re: Add LWLock blocker(s) information