Re: [HACKERS] path toward faster partition pruning

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] path toward faster partition pruning
Date: 2017-12-28 02:07:49
Message-ID: CAKJS1f9+46P9119wpHo+M2=yYnzPbDpTDoPv2pT0-QqVPDW17Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22 December 2017 at 17:25, Amit Langote
<Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Please find attached updated patches.

Hi Amit,

I've just completed a pass over the v17 patch set. I've found a number
of things that need to be addressed. Some might seem a bit nit-picky,
sorry about that. However, many of the others genuinely need to be
addressed.

1. The following code calls RelationGetPartitionQual, but the output
of that is only needed when partition_bound_has_default(boundinfo) is
true.

Can you change this to only get the RelationGetPartitionQual when it's required?

Bitmapset *
get_partitions_from_clauses(Relation relation, int rt_index,
ParamListInfo prmlist, ExprContext *econtext,
List *partclauses)
{
Bitmapset *result;
List *partconstr = RelationGetPartitionQual(relation);

2. The header comment in match_clauses_to_partkey() does not give any
warning that 'clauses' is modified within the function.

The function should create a copy of the clauses before modifying
them. This will save you having to do any list_copy calls when you're
calling the function.

The header comment is also not very clear about what the return value
of the function is.

3. "method" I think should be "strategy". We've pretty much
standardised on that term everywhere else, so let's keep to standard.

/*
* Does the query specify a key to be null or not null? Partitioning
* handles null partition keys specially depending on the partitioning
* method in use, we store this information.
*/

4. "relation" should be in single quotes, since you're talking about
the parameter named "relation". Likewise with "partclauses", otherwise
it just seems like bad English.

* Determine the set of partitions of relation that will satisfy all
* the clauses contained in partclauses

5. partdesc's assignment can be delayed until it's needed. This will
save generating it when constfalse == true

static Bitmapset *
get_partitions_from_clauses_recurse(Relation relation, int rt_index,
List *clauses)
{
PartitionDesc partdesc = RelationGetPartitionDesc(relation);

6. In the following comment, I'd have expected almost identical code,
just with some other List, but the code probably differs a bit too
much to use "Ditto".

/*
* Ditto, but this time or_clauses.
*/

7. Comment claims we use "set union", but we're really just collecting
the members from the other set in:

/*
* Partition sets obtained from mutually-disjunctive clauses are
* combined using set union.
*/
result = bms_add_members(result, arg_partset);

8. These arrays could just be initialized up to partkey->partnatts.
I'd imagine most of the time this will be just 1, so would save
needlessly setting the 31 other elements, although, perhaps it's a bad
idea to optimize this.

memset(keyclauses_all, 0, sizeof(keyclauses_all));
/* false means we don't know if a given key is null */
memset(keyisnull, false, sizeof(keyisnull));
/* false means we don't know if a given key is not null */
memset(keyisnotnull, false, sizeof(keyisnull));

The last two of these could just be Bitmapsets and you'd not need any
memset at all. PARTITION_MAX_KEYS just so happens to be the same as
BITS_PER_BITMAPWORD in a standard build, so you'd always be able to
mark everything with a single bitmap word. This would help a bit in
the various places that you're counting the true elements, for
example, the following code:

for (i = 0; i < partkey->partnatts; i++)
{
if (!keys->keyisnotnull[i])
{
include_def = true;
break;
}
}

could become:

include_def = !bms_is_empty(keys->keyisnotnull);

if you converted all these to Bitmapsets.

9. The following comment would be better read as: /* clause does not
match this partition key */

/* Clause not meant for this column. */

10. The following comment talks about handling less than operators for
hash opfamilies, but the code only handles <> for btree and list
partitioning.

* Handle some cases wherein the clause's operator may not
* belong to the partitioning operator family. For example,
* operators named '<>' are not listed in any operator
* family whatsoever. Also, ordering opertors like '<' are
* not listed in the hash operator family.

"opertors" should be spelled "operators"

11. In the following comment "operator" should be "operators":

* are constrained by clauses containing equality operator, unless hash

Likewise for:

* IS NULL clauses while remaining have clauses with equality operator.

12. The following code in classify_partition_bounding_keys probably
should use EXPR_MATCHES_PARTKEY.

/* Does leftop match with this partition key column? */
if ((IsA(arg, Var) &&
((Var *) arg)->varattno == partattno) ||
equal(arg, partexpr))

13. The comment for classify_partition_bounding_keys does not seem to
define well what the meaning of the return value is:

* classify_partition_bounding_keys
* Classify partition clauses into equal, min, and max keys, along with
* any Nullness constraints and return that information in the output
* argument keys (number of keys is the return value)

I had thought all along that this must mean the number of distinct
partition keys that we've found useful clauses for, but it's possible
to fool this with duplicate IS NULL checks.

create table hp (a int, b text) partition by hash (a, b);
create table hp0 partition of hp for values with (modulus 4, remainder 0);
create table hp3 partition of hp for values with (modulus 4, remainder 3);
create table hp1 partition of hp for values with (modulus 4, remainder 1);
create table hp2 partition of hp for values with (modulus 4, remainder 2);

explain select * from hp where a is null and a is null;

This causes classify_partition_bounding_keys to return 2. I can't see
any bad side effects of this, but I think the comment needs to be
improved. Perhaps it's better just to make the function return bool,
returning true if there are any useful keys?

14. The switch statement in partition_op_strategy could be simplified
a bit and be written more as:

case BTLessEqualStrategyNumber:
*incl = true;
/* fall through */
case BTLessStrategyNumber:
result = PART_OP_LESS;
break;

15. No default case statements in partition_op_strategy() could result
in "result" not being set.

16. I'm unsure what the following comment means:

* To set eqkeys, we must have found the same for partition key columns.

The word "for" seems wrong, or I'm not sure what it wants to ensure it
finds the same for.

17. Header comment for partition_op_strategy is out-of-date.

/*
* Returns -1, 0, or 1 to signify that the partitioning clause has a </<=,
* =, and >/>= operator, respectively. Sets *incl to true if equality is
* implied.
*/

It'll never return -1.

18. The following comment has some grammar issues:

/*
* If couldn't coerce to the partition key type, that is, the type of
* datums stored in PartitionBoundInfo, no hope of using this
* expression for anything partitioning-related.
*/

Would be better with:

/*
* If we couldn't coerce the partition key type, that is, the type
* of datums stored in PartitionBoundInfo, then there's no hope of
* using this expression for anything partitioning-related.
*/

19. In partkey_datum_from_expr() the Assert(false) at the end seems
overkill. You could just get rid of the default in the switch
statement and have it fall through to the final return. This would
save 4 lines of code.

20. The word "and" I think needs removed from the following comment:

* Couldn't compare; keep hash_clause set to the previous value and
* so add this one directly to the result. Caller would

Probably also needs a comment after "value"

21. The following comment seems to indicate 'cur' is an operator, but
it's a PartClause:

/* The code below is for btree operators, which cur is not. */

It might be better to write

/* The code below handles Btree operators which are not relevant to a
hash-partitioned table. */

22. "Stuff" should be "The code" in:

* Stuff that follows closely mimics similar processing done by

23. In remove_redundant_clauses() and various other places, you have a
variable named partattoff. What's the meaning of this name? I'd
imagine it is short for "partition attribute offset", but it's a
"partition key index", so why not "partkeyidx"?

Of course, you might claim that an array index is just an offset, but
it is a little confusing if you think of attribute offsets in a
TupleDesc.

24. get_partitions_for_keys() you're using "return" and "break" in the
switch statement. You can remove the breaks;

25. The Assert(false) in get_partitions_for_keys seems overkill. I
think it's fine to just do:

return NULL; /* keep compiler quiet */

26. The following code comment in get_partitions_for_keys_hash() does
not seem very well written:

* Hash partitioning handles puts nulls into a normal partition and
* doesn't require to define a special null-accpting partition.
* Caller didn't count nulls as a valid key; do so ourselves.

Maybe "puts" should be "storing"?

Also, does hash partitioning actually support a NULL partition? This
seems to say it doesn't require, but as far as I can see it does not
*support* a NULL partition. The comment is a bit misleading.
"accpting" should be "accepting".

27. In get_partitions_for_keys_hash() why is it possible to get a
result_index below 0? In that case, the code will end up triggering
the Assert(false), but if you want to Assert something here then maybe
it should be Assert(result_index >= 0)?

if (result_index >= 0)
return bms_make_singleton(result_index);

28. Is there any point in the following loop in get_partitions_for_keys_list()?

/*
* We might be able to get the answer sooner based on the nullness of
* keys, so get that out of the way.
*/
for (i = 0; i < partkey->partnatts; i++)
{

Won't partkey->partnatts always be 1? Maybe you can Assert that
instead, just in case someone forgets to change the code if LIST is to
support multiple partition keys.

29. In get_partitions_for_keys_list and get_partitions_for_keys_range
partdesc is only used for an Assert. Maybe you can just:

Assert(RelationGetPartitionDesc(rel)->nparts > 0);

30. The Assert(false) in get_partitions_for_keys_list() looks
suspiciously easy to hit...

create table lp (a int not null) partition by list(a);
create table lp1 partition of lp for values in(1);
explain select * from lp where a > 11; -- Assert fail!

31. ranget?

* get_partitions_for_keys_range
* Return partitions of a ranget partitioned table for requested keys

32. I'm not quite sure what this comment is saying:

/*
* eqoff is gives us the bound that is known to be <= eqkeys,
* given how partition_bound_bsearch works. The bound at eqoff+1,
* then, would be the upper bound of the only partition that needs
* to be scanned.
*/

33. "one" -> "the one"

* Find the leftmost bound that satisfies the query, i.e., one that
* satisfies minkeys.

34. Why "will" there be multiple partitions sharing the same prefix?

* If only a prefix of the whole partition key is provided, there will
* be multiple partitions whose bound share the same prefix. If minkey

Perhaps "will" should be "may"?

35. I think "would've" in the following comment is not correct. This
seems to indicate that this has already happened, which it has not.
"will have to be" might be more correct?

* satisfy the query, but don't have a valid partition assigned. The
* default partition would've been included to cover those values.

36. "will" -> "with"?

* Since partition keys will nulls are mapped to default range

37. "If no" -> "If there's no"

* If no tuple datum to compare with the bound, consider
* the latter to be greater.

38. I don't see anything about setting keynullness here:

/*
* Get the clauses that match the partition key, including information
* about any nullness tests against partition keys. Set keynullness to
* a invalid value of NullTestType, which 0 is not.
*/
partclauses = match_clauses_to_partkey(root, rel,
list_copy(rel->baserestrictinfo),
&contains_const,
&constfalse);

39. "paritions" -> "partitions"

* Else there are no clauses that are useful to prune any paritions,

40. In match_clauses_to_partkey, the following code pulls the varnos
from each operand of the expression. It would be better to just pull
the side that's needed (if any) a bit later rather than always doing
both.

left_relids = pull_varnos((Node *) leftop);
right_relids = pull_varnos((Node *) rightop);

41. I think a list_copy is missing here:

/*
* Accumulate the live partitioned children of this child, if it's
* itself partitioned rel.
*/
if (childrel->part_scheme)
partitioned_rels = list_concat(partitioned_rels,
childrel->live_partitioned_rels);

Surely if we don't perform a list_copy of
childrel->live_partitioned_rels then subsequent additions to the
resulting list will inadvertently add new items to the
childrel->live_partitioned_rels?

42. Is this because of an existing bug?

@@ -1906,11 +1904,13 @@ explain (costs off) select * from mcrparted
where abs(b) = 5; -- scans all parti
Filter: (abs(b) = 5)
-> Seq Scan on mcrparted3
Filter: (abs(b) = 5)
+ -> Seq Scan on mcrparted4
+ Filter: (abs(b) = 5)
-> Seq Scan on mcrparted5
Filter: (abs(b) = 5)

43. In partition_prune.sql you have a mix of /* */ and -- comments.
Please just use --

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-12-28 02:32:26 Re: pgsql: Add parallel-aware hash joins.
Previous Message Jeff Janes 2017-12-28 01:45:31 MCV lists for highly skewed distributions