Re: upper planner path-ification

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: upper planner path-ification
Date: 2015-06-23 08:41:07
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8011093D4@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: David Rowley [mailto:david(dot)rowley(at)2ndquadrant(dot)com]
> Sent: Tuesday, June 23, 2015 2:06 PM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Robert Haas; pgsql-hackers(at)postgresql(dot)org; Tom Lane
> Subject: Re: [HACKERS] upper planner path-ification
>
>
> On 23 June 2015 at 13:55, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
>
>
> Once we support to add aggregation path during path consideration,
> we need to pay attention morphing of the final target-list according
> to the intermediate path combination, tentatively chosen.
> For example, if partial-aggregation makes sense from cost perspective;
> like SUM(NRows) of partial COUNT(*) AS NRows instead of COUNT(*) on
> billion rows, planner also has to adjust the final target-list according
> to the interim paths. In this case, final output shall be SUM(), instead
> of COUNT().
>
>
>
>
> This sounds very much like what's been discussed here:
>
> http://www.postgresql.org/message-id/CA+U5nMJ92azm0Yt8TT=hNxFP=VjFhDqFpaWfmj
> +66-4zvCGv3w(at)mail(dot)gmail(dot)com
>
>
> The basic concept is that we add another function set to aggregates that allow
> the combination of 2 states. For the case of MIN() and MAX() this will just be
> the same as the transfn. SUM() is similar for many types, more complex for others.
> I've quite likely just borrowed SUM(BIGINT)'s transfer functions to allow
> COUNT()'s to be combined.
>
STDDEV, VARIANCE and relevant can be constructed using nrows, sum(X) and sum(X^2).
REGR_*, COVAR_* and relevant can be constructed using nrows, sum(X), sum(Y),
sum(X^2), sum(Y^2) and sum(X*Y).

Let me introduce a positive side effect of this approach.
Because final aggregate function processes values already aggregated partially,
the difference between the state value and transition value gets relatively small.
It reduces accidental errors around floating-point calculation. :-)

> More time does need spent inventing the new combining functions that don't
> currently exist, but that shouldn't matter as it can be done later.
>
> Commitfest link to patch here https://commitfest.postgresql.org/5/131/
>
> I see you've signed up to review it!
>
Yes, all of us looks at same direction.

Problem is, we have to cross the mountain of the planner enhancement to reach
all the valuable:
- parallel aggregation
- aggregation before join
- remote aggregation via FDW

So, unless we don't find out a solution around planner, 2-phase aggregation is
like a curry without rice....

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2015-06-23 08:57:06 Re: Time to get rid of PQnoPasswordSupplied?
Previous Message Fabien COELHO 2015-06-23 07:00:03 Re: checkpointer continuous flushing