Re: Partition-wise aggregation/grouping

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise aggregation/grouping
Date: 2017-11-01 12:50:11
Message-ID: CAM2+6=X8VFpCvq=d2vYMxmTEmeC-Fb-_NwbZgvSALuBrDzU3hA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 28, 2017 at 3:07 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Fri, Oct 27, 2017 at 1:01 PM, Jeevan Chalke
> <jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:
> > 1. Added separate patch for costing Append node as discussed up-front in
> the
> > patch-set.
> > 2. Since we now cost Append node, we don't need
> > partition_wise_agg_cost_factor
> > GUC. So removed that. The remaining patch hence merged into main
> > implementation
> > patch.
> > 3. Updated rows in test-cases so that we will get partition-wise plans.
>
> With 0006 applied, cost_merge_append() is now a little bit confused:
>
> /*
> * Also charge a small amount (arbitrarily set equal to operator cost)
> per
> * extracted tuple. We don't charge cpu_tuple_cost because a
> MergeAppend
> * node doesn't do qual-checking or projection, so it has less overhead
> * than most plan nodes.
> */
> run_cost += cpu_operator_cost * tuples;
>
> /* Add MergeAppend node overhead like we do it for the Append node */
> run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples;
>
> The first comment says that we don't add cpu_tuple_cost, and the
> second one then adds half of it anyway.
>

Yep.
But as David reported earlier, if we remove the first part i.e. adding
cpu_operator_cost per tuple, Merge Append will be preferred over an Append
node unlike before. And thus, I thought of better having both, but no so
sure. Should we remove that part altogether, or add both in a single
statement with updated comments?

> I think it's fine to have a #define for DEFAULT_APPEND_COST_FACTOR,
> because as you say it's used twice, but I don't think that should be
> exposed in cost.h; I'd make it private to costsize.c and rename it to
> something like APPEND_CPU_COST_MULTIPLIER. The word DEFAULT, in
> particular, seems useless to me, since there's no provision for it to
> be overridden by a different value.
>

Agree. Will make that change.

>
> What testing, if any, can we think about doing with this plan to make
> sure it doesn't regress things? For example, if we do a TPC-H run
> with partitioned tables and partition-wise join enabled, will any
> plans change with this patch?

I have tried doing this on my local developer machine. For 1GB database
size (tpc-h scale factor 1), I see no plan change with and without this
patch.

I have tried with scale factor 10, but query is not executing well due to
space and memory constraints. Can someone try out that?

> Do they get faster or not? Anyone have
> other ideas for what to test?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ildus Kurbangaliev 2017-11-01 13:24:16 proposal: extend shm_mq to support more use cases
Previous Message Chris Travers 2017-11-01 12:43:35 Re: Oracle to PostGre