Re: POC: GROUP BY optimization

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: POC: GROUP BY optimization
Date: 2022-03-29 02:02:51
Message-ID: CALNJ-vQp3WNV-kKuGsjSOhjRfdECA1Dbxa-vjAgOmQ4f4Qaw-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 28, 2022 at 5:49 PM Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
wrote:

> Hi,
>
> Here's a rebased/improved version of the patch, with smaller parts
> addressing various issues. There are seven parts:
>
> 0001 - main part, just rebased
>
> 0002 - replace the debug GUC options with a single GUC to disable the
> optimization if needed
>
> 0003 - minor code cleanup, removal of unnecessary variable
>
> 0004 - various comment fixes (rewordings, typos, ...)
>
> 0005 - a minor code simplification, addressing FIXMEs from 0004
>
> 0006 - adds the new GUC to the docs
>
> 0007 - demonstrates plan changes with a disabled optimization
>
> The first 6 parts should be squashed and committed at one, I only kept
> them separate for clarity. The 0007 is merely a demonstration of the new
> GUC and that it disables the optimization.
>
> > Agree. Because it is a kind of automation we should allow user to switch
> > it off in the case of problems or manual tuning.
> > > Also, I looked through this patch. It has some minor problems:
> > 1. Multiple typos in the patch comment.
>
> I went through the comments and checked all of them for grammar mistakes
> and typos using a word processor, so hopefully that should be OK. But
> maybe there's still something wrong.
>
> > 2. The term 'cardinality of a key' - may be replace with 'number of
> > duplicates'?
>
> No, cardinality means "number of distinct values", so "duplicates" would
> be wrong. And I think "cardinality" is well established term, so I think
> it's fine.
>
> BTW I named the GUC enable_group_by_reordering, I wonder if it should be
> named differently, e.g. enable_groupby_reordering? Opinions?
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

Hi,

For 0001-Optimize-order-of-GROUP-BY-keys-20220328.patch:

multiple parametes need to be

parametes -> parameters

leave more expensive comparions

comparions -> comparisons

+ if (has_fake_var == false)

The above can be written as:

if (!has_fake_var)

+ nGroups = ceil(2.0 + sqrt(tuples) * (i + 1) /
list_length(pathkeys));

Looks like the value of tuples doesn't change inside the loop.
You can precompute sqrt(tuples) outside the loop and store the value in a
variable.

+ return -1;
+ else if (a->cost == b->cost)
+ return 0;
+ return 1;

the keyword 'else' is not needed.

+ * Returns newly allocated lists. If no reordering is possible (or needed),
+ * the lists are set to NIL.
+ */
+static bool
+get_cheapest_group_keys_order(PlannerInfo *root, double nrows,

It seems the comment for return value doesn't match the bool return type.

+ /* If this optimization is disabled, we're done. */
+ if (!debug_cheapest_group_by)

It seems enable_cheapest_group_by would be better name for the flag.

Cheers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-03-29 02:03:17 Re: MDAM techniques and Index Skip Scan patch
Previous Message Peter Geoghegan 2022-03-29 01:58:22 Re: MDAM techniques and Index Skip Scan patch