Re: Sharing aggregate states between different aggregate functions

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sharing aggregate states between different aggregate functions
Date: 2015-07-27 08:11:52
Message-ID: 55B5E7C8.8090007@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07/27/2015 08:34 AM, David Rowley wrote:
> - * agg_input_types, agg_state_type, agg_result_type identify the input,
> - * transition, and result types of the aggregate. These should all be
> - * resolved to actual types (ie, none should ever be ANYELEMENT etc).
> + * agg_input_types identifies the input types of the aggregate. These
> should
> + * be resolved to actual types (ie, none should ever be ANYELEMENT etc).
>
> I'm not sure I understand why agg_state_type and agg_result_type have
> changed here.

The function no longer has the agg_result_type argument, but the removal
of agg_state_type from the comment was a mistake.

> + peraggstate->sortstates = (Tuplesortstate **)
> + palloc0(sizeof(Tuplesortstate *) * numGroupingSets);
> + for (currentsortno = 0; currentsortno < numGroupingSets; currentsortno++)
> + peraggstate->sortstates[currentsortno] = NULL;
>
> This was not you, but this NULL setting looks unneeded due to the palloc0().

Yeah, I noticed that too. Ok, let's take it out.

> In this function I also wasn't quite sure if it was with comparing both
> non-NULL INITCOND's here. I believe my code comments may slightly
> contradict what the code actually does, as the comments talk about them
> having to match, but the code just bails if any are non-NULL. The reason I
> didn't check them was because it seems inevitable that some duplicate work
> needs to be done when setting up the INITCOND. Perhaps it's worth it?

It would be nice to handle non-NULL initconds. I think you'll have to
check that the input function isn't volatile. Or perhaps just call the
input function, and check that the resulting Datum is byte-per-byte
identical, although that might be awkward to do with the current code
structure.

>> BTW, the name of the AggStatePerAggStateData struct is pretty horrible.
>> The repeated "AggState" feels awkward. Now that I've stared at the patch
>> for a some time, it doesn't bother me anymore, but it took me quite a while
>> to over that. I'm sure it will for others too. And it's not just that
>> struct, the comments talk about "aggregate state", which could be confused
>> to mean "AggState", but it actually means AggStatePerAggStateData. I don't
>> have any great suggestions, but can you come up a better naming scheme?
>
> I agree, they're horrible. The thing that's causing the biggest problem is
> the struct named AggState, which carries state for *all* aggregates, and
> has nothing to do with "transition state", so it seems there's two
> different meanings if the word "state" and I've used both meanings in the
> name for AggStatePerAggStateData.
>
> Perhaps just renaming AggStatePerAggStateData to AggStateTransStateData
> would be good enough?

Hmm. I think it should be "AggStatePerTransData" then, to keep the same
pattern as AggStatePerAggData and AggStatePerGroupData.

> I've attached a delta patch based on your patch, in this I've:
>
> 1. Renamed AggStatePerAggStateData to AggStateTransStateData and all
> variables using that are renamed to suit better.
> 2. Removed surplus peraggstate->sortstates[currentsortno] = NULL; (not
> related to this patch, but since we're moving that part of the code, we'd
> better fix)
> 3. Put back the missing aggfnoid from the error message.
> 4. Changed initialize_aggregates() to not pass the states. They're already
> in AggState and we're using aggstate->numstates to get the count of the
> items in that array, so it seems wrong to allow a different array to ever
> be passed in.
> 5. Changed wording of a few comments to try and reduce confusing of 'state'
> and 'transition state'.
> 6. Renamed AggState.peraggstate to transstates. I pluralised this to try to
> reduce confusion of the single state pointers named 'transstate' in the
> functions in nodeAgg.c. I did think that peragg should also become peraggs
> and pergroup should become pergroups, but didn't change those.
>
> Anything else I changed is self explanatory.
>
> What do you think?

Looks good, thanks!

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2015-07-27 08:41:39 Re: raw output from copy
Previous Message Marc Mamin 2015-07-27 07:52:00 Re: pg_dump -Fd and compression level