Re: Parallel Aggregate

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Aggregate
Date: 2016-03-12 03:31:33
Message-ID: CAKJS1f_6_EgzcBEC+28WizenP48SOKbE-6crK6-PkdwynipT5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11 March 2016 at 03:39, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> A couple of things which I'm not 100% happy with.
>
> 1. make_partialgroup_input_target() doing lookups to the syscache.
> Perhaps this job can be offloaded to a new function in a more suitable
> location. Ideally the Aggref would already store the required
> information, but I don't see a great place to be looking that up.

I've made some changes and moved this work off to a new function in tlist.c.

> 2. I don't really like how I had to add tlist to
> create_grouping_paths(), but I didn't want to go to the trouble of
> calculating the partial agg PathTarget if Parallel Aggregation is not
> possible, as this does do syscache lookups, so it's not free, so I'd
> rather only do it if we're actually going to add some parallel paths.

This is now fixed. The solution was much easier after 49635d7b.

> 3. Something about the force_parallel_mode GUC. I'll think about this
> when I start to think about how to test this, as likely I'll need
> that, else I'd have to create tables bigger than we'd want to in the
> regression tests.

On further analysis it seems that this GUC does not do what I thought
it did, which will be why Robert said that I don't need to think about
it here. The GUC just seems to add a Gather node at the base of the
plan tree, when possible. Which leaves me a bit lost when it comes to
how to write tests for this... It seems like I need to add at least
136k rows to a test table to get a Parallel Aggregate plan, which I
think is a no-go for the regression test suite... that's with
parallel_setup_cost=0;

It would be nice if that GUC just, when enabled, preferred the
cheapest parallel plan (when available), rather than hacking in a
Gather node into the plan's root. This should have the same result in
many cases anyway, and would allow me to test this without generating
oversized tables in the regression tests.

I've attached an updated patch which is based on commit 7087166,
things are really changing fast in the grouping path area at the
moment, but hopefully the dust is starting to settle now.

This patch also fixes a couple of bugs, one in the cost estimates for
the number of groups that will be produced by the final aggregate,
also there was a missing copyObject() in the setrefs.c code which
caused a Var not found in targetlist problem in setrefs.c for plans
with more than 1 partial aggregate node... I had to modify the planner
to get it to add an additional aggregate node to test this (separate
test patch for this is attached).

Comments/Reviews/Testing all welcome.

Thanks

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
parallel_aggregation_8a26089_2016-03-12.patch application/octet-stream 43.5 KB
parallel_aggregation_extra_combine_node.patch application/octet-stream 799 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-03-12 03:40:38 Re: OS X 10.11.3, psql, bus error 10, 9.5.1
Previous Message Tom Lane 2016-03-12 03:30:12 Re: pl/pgSQL, get diagnostics and big data