Re: Hash support for grouping sets

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Mark Dilger <hornschnorter(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hash support for grouping sets
Date: 2017-03-23 05:09:46
Message-ID: 8760j0abpl.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Andres" == Andres Freund <andres(at)anarazel(dot)de> writes:

>> - Assert(newphase == 0 || newphase == aggstate->current_phase + 1);
>> + Assert(newphase <= 1 || newphase == aggstate->current_phase + 1);

Andres> I think this somewhere in the file header needs an expanded
Andres> explanation about what these "special" phases mean.

I could move most of the block comment for ExecInitAgg up into the file
header; would that be better?

Andres> I've not yet read up on the thread, or the whole patch, but an
Andres> explanation of the memory management for all those tables would
Andres> be good somewhere (maybe I've just not read that far).

I can expand on that.

Andres> "mixed types" sounds a bit weird.

changing to "if there are both types present"

>> + values = palloc((1 + max_weight) * sizeof(double));
>> + sets = palloc((1 + max_weight) * sizeof(Bitmapset *));

Andres> We usually cast the result of palloc.

Rough count in the backend has ~400 without casts to ~1350 with, so this
doesn't seem to have been consistently enforced.

>> +static void
>> +_outGroupingSetData(StringInfo str, const GroupingSetData *node)
>> +{
>> + WRITE_NODE_TYPE("GSDATA");
>> +
>> + WRITE_NODE_FIELD(set);
>> + WRITE_FLOAT_FIELD(numGroups, "%.0f");
>> +}

Andres> .0f?

Copied from every other node that uses "%.0f" to write out a numGroups
value.

>> +static grouping_sets_data *
>> +preprocess_grouping_sets(PlannerInfo *root)
>> +{

Andres> It'd not hurt to add a comment as to what this function is
Andres> actually doing.

Sure.

Andres> I hope there's tests for both these branches.

A number of tests for both unsortable and unhashable cases are in the
regression test.

>> + /*
>> + * Is it hashable? We pretend empty sets are hashable even though we
>> + * actually force them not to be hashed later. But don't bother if
>> + * there's nothing but empty sets.
>> + */

Andres> Why?

If there's no non-empty grouping set then there's nothing to hash (the
hash code assumes at least one column). I'll put that in the comment.

>> @@ -3214,6 +3407,11 @@ get_number_of_groups(PlannerInfo *root,
>> * estimate_hashagg_tablesize
>> * estimate the number of bytes that a hash aggregate hashtable will
>> * require based on the agg_costs, path width and dNumGroups.
>> + *
>> + * XXX this may be over-estimating the size now that hashagg knows to omit
>> + * unneeded columns from the hashtable. Also for mixed-mode grouping sets,
>> + * grouping columns not in the hashed set are counted here even though hashagg
>> + * won't store them. Is this a problem?
>> */

Andres> Hm. This seems like it could move us to use sorting, even if
Andres> not actually warranted.

This is largely a pre-existing issue that this patch doesn't try and
fix. That would be an issue for a separate patch, I think. I merely
added the comment to point it out.

Andres> What's the newline after the comment about?

Old style habits die hard.

>> + RelOptInfo *grouped_rel,
>> + Path *path,
>> + bool is_sorted,
>> + bool can_hash,
>> + PathTarget *target,
>> + grouping_sets_data *gd,
>> + const AggClauseCosts *agg_costs,
>> + double dNumGroups)
>> +{
>> + Query *parse = root->parse;
>> +
>> + /*
>> + * If we're not being offered sorted input, then only consider plans that
>> + * can be done entirely by hashing.
>> + *
>> + * We can hash everything if it looks like it'll fit in work_mem. But if
>> + * the input is actually sorted despite not being advertised as such, we
>> + * prefer to make use of that in order to use less memory.
>> + * If none of the grouping sets are sortable, then ignore the work_mem
>> + * limit and generate a path anyway, since otherwise we'll just fail.
>> + */
>> + if (!is_sorted)
>> + {
>> + List *new_rollups = NIL;
>> + RollupData *unhashable_rollup = NULL;
>> + List *sets_data;
>> + List *empty_sets_data = NIL;
>> + List *empty_sets = NIL;
>> + ListCell *lc;
>> + ListCell *l_start = list_head(gd->rollups);
>> + AggStrategy strat = AGG_HASHED;
>> + Size hashsize;
>> + double exclude_groups = 0.0;
>> +
>> + Assert(can_hash);
>> +
>> + if (pathkeys_contained_in(root->group_pathkeys, path->pathkeys))
>> + {
>> + unhashable_rollup = lfirst(l_start);

Andres> a) cast result of lfirst/lnext/whatnot.

casting lfirst is defensible (and only about 10% of existing uses don't
cast it), but why would you ever cast the result of lnext?

Andres> b) Why is this guaranteed to be unhashable?

It might not be unhashable; that probably needs a better name. It's the
rollup that we aren't going to hash because we're being presented with
input already in correct order. Also, it's the only rollup that can
contain empty grouping sets, which the hash code can't cope with.

Obviously this needs a comment and some consideration of naming.

>> + if (hashsize > work_mem * 1024L && gd->rollups)
>> + return; /* nope, won't fit */

Andres> Didn't you above talk about generating a path anyway? Or is
Andres> rollups guaranteed not to be set in that case (if so, not
Andres> obvious on a casual read)?

Unsortable groups can't participate in rollups. The presence of a rollup
implies that we're going to be offered at least one is_sorted input
path.

>> + if (can_hash && gd->any_hashable)
>> + {
>> + List *rollups = NIL;
>> + List *hash_sets = list_copy(gd->unsortable_sets);
>> + double availspace = (work_mem * 1024.0);

Andres> Why the parens, and why the .0?

The .0 is because the * should be done as a double, not an int. This
isn't an uncommon idiom, there are quite a few similar uses in the
planner alone.

The parens because I thought it was clearer that way.

Andres> I think you need a higher level explanation of how you're
Andres> mapping the work_mem issue to knapsack somewhere.

The general overview ("work_mem is the knapsack size, and we need to
figure out how best to pack it with hashtables"), or the specific issue
of the scale factor for discrete chunking of the size?

Andres> Hm. So we're solely optimizing for the number of sorts, rather
Andres> the cost of the sorts. Probably an acceptable tradeoff.

The existing cost model for sorts doesn't actually seem to help us here;
all of our sorts sort the same data, just with different comparison
columns, but cost_sort doesn't actually account for that (all its
callers except the CLUSTER planning code pass in 0.0 for
comparison_cost).

If the sort costs were correct, we could compute a "value" for each
rollup that reflected the cost difference between the sort+unique
comparisons for it, and the hash functions that would be called if we
hashed instead; and just pass that to the knapsack function.

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2017-03-23 05:11:13 Re: Logical decoding on standby
Previous Message Andrew Borodin 2017-03-23 05:03:01 Re: Review: GIN non-intrusive vacuum of posting tree