Re: Sharing aggregate states between different aggregate functions

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sharing aggregate states between different aggregate functions
Date: 2015-07-26 15:24:31
Message-ID: 55B4FBAF.7000408@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07/09/2015 12:44 PM, David Rowley wrote:
> On 15 June 2015 at 12:05, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
>
>>
>> This basically allows an aggregate's state to be shared between other
>> aggregate functions when both aggregate's transition functions (and a few
>> other things) match
>> There's quite a number of aggregates in our standard set which will
>> benefit from this optimisation.
>>
> After compiling the original patch with another compiler, I noticed a
> couple of warnings.
>
> The attached fixes these.

I spent some time reviewing this. I refactored the ExecInitAgg code
rather heavily to make it more readable (IMHO); see attached. What do
you think? Did I break anything?

Some comments:

* In aggref_has_compatible_states(), you give up if aggtype or aggcollid
differ. But those properties apply to the final function, so you were
leaving some money on the table by disallowing state-sharing if they differ.

* The filter and input expressions are initialized for every AggRef,
before the deduplication logic kicks in. The AggrefExprState.aggfilter,
aggdirectargs and args fields really belong to the AggStatePerAggState
struct instead. This is not a new issue, but now that we have a
convenient per-aggstate struct to put them in, let's do so.

* There was a reference-after free bug in aggref_has_compatible_states;
you cannot ReleaseSysCache and then continue pointing to the struct.

* The code was a bit fuzzy on which parts of the per-aggstate are filled
in at what time. Some of the fields were overwritten every time a match
was found. With the same values, so no harm done, but I found it
confusing. I refactored ExecInitAgg in the attached patch to clear that up.

* There API of build_aggregate_fnexprs() was a bit strange now that some
callers use it to only fill in the final function, some call it to fill
both the transition functions and the final function. I split it to two
separate functions.

* I wonder if we should do this duplicate elimination at plan time. It's
very fast, so I'm not worried about that right now, but you had grand
plans to expand this so that an aggregate could optionally use one of
many different kinds of state values. At that point, it certainly seems
like a planning decision to decide which aggregates share state. I think
we can leave it as it is for now, but it's something to perhaps consider
later.

BTW, the name of the AggStatePerAggStateData struct is pretty horrible.
The repeated "AggState" feels awkward. Now that I've stared at the patch
for a some time, it doesn't bother me anymore, but it took me quite a
while to over that. I'm sure it will for others too. And it's not just
that struct, the comments talk about "aggregate state", which could be
confused to mean "AggState", but it actually means
AggStatePerAggStateData. I don't have any great suggestions, but can you
come up a better naming scheme?

- Heikki

Attachment Content-Type Size
sharing_aggstate-heikki-1.patch application/x-patch 66.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2015-07-26 15:29:03 Re: Buildfarm failure from overly noisy warning message
Previous Message Andres Freund 2015-07-26 15:15:56 Re: [HACKERS] GSets: Getting collation related error when GSets has text column