Re: Parallel Aggregate

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregate
Date: 2016-03-14 23:56:29
Message-ID: CAKJS1f8L3g432dDTn2QWf1Z2PaeXotDbf2xCBLqa=9pMzQXoEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15 March 2016 at 08:53, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I haven't fully studied every line of this yet, but here are a few comments:
>
> + case T_PartialAggref:
> + coll = InvalidOid; /* XXX is this correct? */
> + break;
>
> I doubt it.

Thanks for looking at this.

Yeah, I wasn't so sure of the collation thing either, so stuck a
reminder on there. The way I'm seeing it at the moment is that since
the partial aggregate is never displayed to the user, and we never
perform equality comparison on them (since HAVING is applied in the
final aggregate stage), then my line of thought was that the collation
should not matter. Over on the combine aggregate states thread I'm
doing work to make the standard serialize functions use bytea, and
bytea don't allow collate:

# create table c (b bytea collate "en_NZ");
ERROR: collations are not supported by type bytea
LINE 1: create table c (b bytea collate "en_NZ");

I previously did think of reusing the Aggref's collation, but I ended
up leaning towards the more "does not matter" side of the argument. Of
course, I may have completely failed to think of some important
reason, which is why I left that comment, so it might provoke some
thought with someone else with more collation knowledge.

> More generally, why are we inventing PartialAggref
> instead of reusing Aggref? The code comments seem to contain no
> indication as to why we shouldn't need all the same details for
> PartialAggref that we do for Aggref, instead of only a small subset of
> them. Hmm... actually, it looks like PartialAggref is intended to
> wrap Aggref, but that seems like a weird design. Why not make Aggref
> itself DTRT? There's not enough explanation in the patch of what is
> going on here and why.

A comment does explain this, but perhaps it's not good enough, so I've
rewritten it to become.

/*
* PartialAggref
*
* When partial aggregation is required in a plan, the nodes from the partial
* aggregate node, up until the finalize aggregate node must pass the partially
* aggregated states up the plan tree. In regards to target list construction
* in setrefs.c, this requires that exprType() returns the state's type rather
* than the final aggregate value's type, and since exprType() for Aggref is
* coded to return the aggtype, this is not correct for us. We can't fix this
* by going around modifying the Aggref to change it's return type as setrefs.c
* requires searching for that Aggref using equals() which compares all fields
* in Aggref, and changing the aggtype would cause such a comparison to fail.
* To get around this problem we wrap the Aggref up in a PartialAggref, this
* allows exprType() to return the correct type and we can handle a
* PartialAggref in setrefs.c by just peeking inside the PartialAggref to check
* the underlying Aggref. The PartialAggref lives as long as executor start-up,
* where it's removed and replaced with it's underlying Aggref.
*/
typedef struct PartialAggref

does that help explain it better?

> }
> + if (can_parallel)
> + {
>
> Seems like a blank line would be in order.

Fixed.

> I don't see where this applies has_parallel_hazard or anything
> comparable to the aggregate functions. I think it needs to do that.

Not sure what you mean here.

>
> + /* XXX +1 ? do we expect the main process to actually do real work? */
> + numPartialGroups = Min(numGroups, subpath->rows) *
> + (subpath->parallel_degree + 1);
> I'd leave out the + 1, but I don't think it matters much.

Actually I meant to ask you about this. I see that subpath->rows is
divided by the Path's parallel_degree, but it seems the main process
does some work too, so this is why I added + 1, as during my tests
using a query which produces 10 groups, and had 4 workers, I noticed
that Gather was getting 50 groups back, rather than 40, I assumed this
is because the main process is helping too, but from my reading of the
parallel query threads I believe this is because the Gather, instead
of sitting around idle tries to do a bit of work too, if it appears
that nothing else is happening quickly enough. I should probably go
read nodeGather.c to learn that though.

In the meantime I've removed the + 1, as it's not correct to do
subpath->rows * (subpath->parallel_degree + 1), since it was divided
by subpath->parallel_degree in the first place, we'd end up with an
extra worker's worth of rows for queries which estimate a larger
number of groups than partial path rows.

> + aggstate->finalizeAggs == true)
>
> We usually just say if (a) not if (a == true) when it's a boolean.
> Similarly !a rather than a == false.

hmm, thanks. It appears that I've not been all that consistent in that
area. I didn't know that was convention. I see that some of my way
have crept into the explain.c changes already :/

I will send an updated patch once I address Tomas' concerns too.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kouhei Kaigai 2016-03-15 00:03:40 Re: Reworks of CustomScan serialization/deserialization
Previous Message James Sewell 2016-03-14 23:52:36 Re: Parallel Aggregate