Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)

From: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)
Date: 2020-09-20 23:09:56
Message-ID: CAEudQAptB=Tqovp8bRofJHu34tTdMGVVN1jAfrJjnXHpCj+wXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Em sex., 18 de set. de 2020 às 10:37, Tomas Vondra <
tomas(dot)vondra(at)2ndquadrant(dot)com> escreveu:

> On Thu, Sep 17, 2020 at 06:31:12PM -0300, Ranier Vilela wrote:
> >Em qui., 17 de set. de 2020 às 17:09, Tomas Vondra <
> >tomas(dot)vondra(at)2ndquadrant(dot)com> escreveu:
> >
> >> On Thu, Sep 17, 2020 at 02:12:12PM -0300, Ranier Vilela wrote:
> >> >Hi,
> >> >
> >> >In case gd->any_hashable is FALSE, grouping_is_hashable is never
> called.
> >> >In this case, the planner could use HASH for groupings, but will never
> >> know.
> >> >
> >>
> >> The condition is pretty simple - if the query has grouping sets, look at
> >> grouping sets, otherwise look at groupClause. For this to be an issue
> >> the query would need to have both grouping sets and (independent) group
> >> clause - which AFAIK is not possible.
> >>
> >Uh?
> >(parse->groupClause != NIL) If different from NIL we have ((independent)
> >group clause), grouping_is_hashable should check?
> >(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
> >If gd is not NIL and gd->any_hashtable is FALSE?
> >
>
> Sorry, I'm not quite sure I understand what this is meant to say :-(
>
> Anyway, (groupClause != NIL) does not mean the groupClause is somehow
> independent (of what?). Add some debugging to create_grouping_paths and
> you'll see that e.g. this query ends up with groupClause with 3 items:
>
> select 1 from t group by grouping sets ((a), (b), (c));
>
> and so does this one:
>
> select 1 from t group by grouping sets ((a,c), (a,b));
>
> I'm no expert in grouping sets, but I'd bet this means we transform the
> grouping sets into a groupClause by extracting the keys. I haven't
> investigated why exactly we do this, but I'm sure there's a reason (e.g.
> it gives us SortGroupClause).
>
> You seem to believe a query can have both grouping sets and regular
> grouping at the same level - but how would such query look like? Surely
> you can't have two GROuP BY clauses. You can do
>
Translating into code:
gd is grouping sets and
parse->groupClause is regular grouping
and we cannot have both at the same time.

> select 1 from t group by a, grouping sets ((b), (c));
>
> which is however mostly equivalent to (AFAICS)
>
> select 1 from t group by grouping sets ((a,b), (a,c))
>
> so it's not like we have an independent groupClause either I think.
>
> The whole point of the original condition is - if there are grouping
> sets, check if at least one can be executed using hashing (i.e. all keys
> are hashable). Otherwise (without grouping sets) look at the grouping as
> a whole.
>
Or if gd is NULL check parse->groupClause.

> I don't see how your change improves this - if there are grouping sets,
> it's futile to look at the whole groupClause if at least one grouping
> set can't be hashed.
>
> But there's a simple way to disprove this - show us a case (table and a
> query) where your code correctly allows hashing while the current one
> does not.

I'm not an expert in grouping either.
The question I have here is whether gd is populated and has gd->
any_hashable as FALSE,
Its mean no use checking parse-> groupClause, it's a waste of time, Ok.

>
> >
> >> For hashing to be worth considering, at least one grouping set has to be
> >> hashable - otherwise it's pointless.
> >>
> >> Granted, you could have something like
> >>
> >> GROUP BY GROUPING SETS ((a), (b)), c
> >>
> >> which I think essentially says "add c to every grouping set" and that
> >> will be covered by the any_hashable check.
> >>
> >I am not going into the merit of whether or not it is worth hashing.
> >grouping_is_hashable as a last resort, must decide.
> >
>
> I don't know what this is supposed to say either. The whole point of
> this check is to simply skip construction of hash-based paths in cases
> when it's obviously futile (i.e. when some of the keys don't support
> hashing). We do this as early as possible, because the whole point is to
> save precious CPU cycles during query planning.
>

> >
> >> >Apparently gd pointer, will never be NULL there, verified with
> Assert(gd
> >> !=
> >> >NULL).
> >> >
> >>
> >> Um, what? If you add the assert right before the if condition, you won't
> >> even be able to do initdb. It's pretty clear it'll crash for any query
> >> without grouping sets.
> >>
> >Here not:
> >Assert(gd != NULL);
> >create_ordinary_grouping_paths(root, input_rel, grouped_rel,
> > agg_costs, gd, &extra,
> > &partially_grouped_rel);
> >
>
> I have no idea where you're adding this assert. But simply adding it to
> create_grouping_paths (right before the if condition changed by your
> patch) will trigger a failure during initdb. Simply because for queries
> without grouping sets (and with regular grouping) we pass gs=NULL.
>
> Try applying the attached patch and do "pg_ctl -D ... init" - you'll get
> a failure proving that gd=NULL.
>
Well, I did a test right now, downloaded the latest HEAD and added the
Assertive.
I compiled everything, ran the regression tests, installed the binaries,
loaded the server and installed a client database.
Everything is working.
Maybe in all these steps, there is no grouping that would trigger the code
in question, but I doubt it.

>
> >Otherwise Coverity would be right.
> >CID 1412604 (#1 of 1): Dereference after null check (FORWARD_NULL)13.
> >var_deref_model: Passing null pointer gd to
> create_ordinary_grouping_paths,
> >which dereferences it. [show details
> ><
> https://scan6.coverity.com/eventId=31389494-14&modelId=31389494-0&fileInstanceId=105740687&filePath=%2Fdll%2Fpostgres%2Fsrc%2Fbackend%2Foptimizer%2Fplan%2Fplanner.c&fileStart=4053&fileEnd=4180
> >]
> >
> >
> >Which cannot be true, gd is never NULL here.
> >
>
> Or maybe coverity simply does not realize the NULL-dereference can't
> happen, because it's guarded by other conditions, or something like
> that ... As the assert failure demonstrates, we do indeed get NULLs
> here, and it does not crash so coverity is missing something.
>
1. With Assert.
All works.

2. create_ordinary_grouping_paths is called with gd var.

3. create_ordinary_grouping_paths call get_number_of_groups

4. get_number_of_groups dereference gd pointer (line 3705).
foreach(lc, gd->rollups)

5. line (3701:
Assert(gd); /* keep Coverity happy */

Of course, this works, gd is never NULL here.

6. grouping_is_hashable is never called here, because gd is never NULL here.
if ((parse->groupClause != NIL &&
agg_costs->numOrderedAggs == 0 &&
(gd ? gd->any_hashable : grouping_is_hashable(parse->groupClause))))
flags |= GROUPING_CAN_USE_HASH;

The question is is it worth it or not worth calling (grouping_is_hashable),
at that point?

regards,
Ranier Vilela

Attachment Content-Type Size
full_with_assert.txt text/plain 20.1 KB
gd_is_null.patch application/octet-stream 422 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-09-21 00:10:30 Re: Planner, check if can use consider HASH for groupings (src/backend/optimizer/plan/planner.c)
Previous Message Tom Lane 2020-09-20 23:06:09 Re: Yet another fast GiST build