Re: path toward faster partition pruning

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: sulamul(at)gmail(dot)com, david(dot)rowley(at)2ndquadrant(dot)com, robertmhaas(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com, memissemerson(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: path toward faster partition pruning
Date: 2017-11-13 09:46:28
Message-ID: 8460a6c3-68c5-b78a-7d18-d253180f2188@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Horiguchi-san,

Thanks for taking a look. Replying to all your emails here.

On 2017/11/10 12:30, Kyotaro HORIGUCHI wrote:
> In 0002, bms_add_range has a bit naive-looking loop
>
> + while (wordnum <= uwordnum)
> + {
> + bitmapword mask = (bitmapword) ~0;
> +
> + /* If working on the lower word, zero out bits below 'lower'. */
> + if (wordnum == lwordnum)
> + {
> + int lbitnum = BITNUM(lower);
> + mask >>= lbitnum;
> + mask <<= lbitnum;
> + }
> +
> + /* Likewise, if working on the upper word, zero bits above 'upper' */
> + if (wordnum == uwordnum)
> + {
> + int ushiftbits = BITS_PER_BITMAPWORD - (BITNUM(upper) + 1);
> + mask <<= ushiftbits;
> + mask >>= ushiftbits;
> + }
> +
> + a->words[wordnum++] |= mask;
> + }
>
> Without some aggressive optimization, the loop takes most of the
> time to check-and-jump for nothing especially with many
> partitions and somewhat unintuitive.
>
> The following uses a bit tricky bitmap operation but
> is straightforward as a whole.
>
> =====
> /* fill the bits upper from BITNUM(lower) (0-based) of the first word */
> a->workds[wordnum++] += ~(bitmapword)((1 << BITNUM(lower)) - 1);
>
> /* fill up intermediate words */
> while (wordnum < uwordnum)
> a->words[wordnum++] = ~(bitmapword) 0;
>
> /* fill up to BITNUM(upper) bit (0-based) of the last word */
> a->workds[wordnum++] |=
> (~(bitmapword) 0) >> (BITS_PER_BITMAPWORD - (BITNUM(upper) - 1));
> =====

Considering also the David's comment downthread, I will try to incorporate
this into bms_add_range().

> In 0003,
>
> +match_clauses_to_partkey(RelOptInfo *rel,
> ...
> + if (rinfo->pseudoconstant &&
> + (IsA(clause, Const) &&
> + ((((Const *) clause)->constisnull) ||
> + !DatumGetBool(((Const *) clause)->constvalue))))
> + {
> + *constfalse = true;
> + continue;
>
> If we call this function in both conjunction and disjunction
> context (the latter is only in recursive case). constfalse ==
> true means no need of any clauses for the former case.
>
> Since (I think) just a list of RestrictInfo is expected to be
> treated as a conjunction (it's quite doubious, though..),

I think it makes sense to consider a list of RestrictInfo's, such as
baserestrictinfo, that is passed as input to match_clauses_to_partkey(),
to be mutually conjunctive for our purpose here.

> we
> might be better call this for each subnodes of a disjunction. Or,
> like match_clauses_to_index, we might be better provide
> match_clause_to_partkey(rel, rinfo, contains_const), which
> returns NULL if constfalse. (I'm not self-confident on this..)

After reading your comment, I realized that it was wrong that the
recursive call to match_clauses_to_partkey() passed the arguments of an OR
clause all at once. That broke the assumption mentioned above that all of
the clauses in the list passed to match_clauses_to_partkey() are mutually
conjunctive. Instead, we must make a single-member list for each of the
OR clause's arguments and pass the same.

Then if we got constfalse for all of the OR's arguments, then we return
the constfalse=true to the original caller.

>
> + /*
> + * If no commutator exists, cannot flip the qual's args,
> + * so give up.
> + */
> + if (!OidIsValid(expr_op))
> + continue;
>
> I suppose it's better to leftop and rightop as is rather than
> flipping over so that var is placed left-side. Does that make
> things so complex?

Reason to do it that way is that the code placed far away (code in
partition.c that extracts constant values to use for pruning from matched
clauses) can always assume that the clauses determined to be useful for
partition-pruning always come in the 'partkey op constant' form.

> + * It the operator happens to be '<>', which is never listed
> If?

Right, will fix.

> + if (!op_in_opfamily(expr_op, partopfamily))
> + {
> + Oid negator = get_negator(expr_op);
> +
> + if (!OidIsValid(negator) ||
> + !op_in_opfamily(negator, partopfamily))
> + continue;
>
> classify_partition_bounding_keys() checks the same thing by
> checking whether the negator's strategy is
> BTEquealStrategyNumber. (I'm not sure the operator is guaranteed
> to be of btreee, though..) Aren't they needed to be in similar
> way?

You're right. The <>'s negator may not always be a btree operator. So,
we should add a check in match_clauses_to_partkey() that list or range
partitioning is in use, because only those require a btree operator
family. We now have hash partitioning, so need to be careful not to make
the assumption that all partitioning operators are from btree operator
families.

If match_clauses_to_partkey() accepts such a clause with a <> operator,
then classify_partition_bounding_keys() can rely that it will get the
desired btree operators to implement pruning for the same.

> # In the function, "partkey strategy" and "operator strategy" are
> # confusing..

I agree it would be better to make that clear using comments.

Partitioning strategy and operator strategies are intimately related.
List and range partitioning related optimizations will only work if the
clause operators are of valid btree strategies, hash partitioning
optimizations will only work if the operator in the matched clauses is a
valid hash equality operator.

> + AttrNumber attno;
>
> This declaration might be better in a narrower scope.

Agreed, will move.

On 2017/11/10 14:38, Kyotaro HORIGUCHI wrote:
> Hello, this is the second part of the review.
>
> At Fri, 10 Nov 2017 12:30:00 +0900 , Kyotaro HORIGUCHI wrote:
>> In 0003,
>
> In 0004,
>
> The name get_partitions_from_clauses_guts(), it seems to me that
> we usually use _internal for recursive part of some function. (I
> have the same comment as David about the comment for
> get_partition_from_clause())

OK, will replace _guts by _internal. Looking at the David's comments too.

> About the same function:
>
> Couldn't we get out in the fast path when clauses == NIL?

Actually get_partitions_for_clauses() won't be called at all if that were
true. I think I should add an Assert that clauses is not NIL.

>
> + /* No constraints on the keys, so, return *all* partitions. */
> + result = bms_add_range(result, 0, partdesc->nparts - 1);
>
> This allows us to return immediately here. And just above this,
>
> + if (nkeys > 0 && !constfalse)
> + result = get_partitions_for_keys(relation, &keys);
> + else if (!constfalse)
>
> Those two conditions are not orthogonal. Maybe something like
> following seems more understantable.
>
>> if (!constfalse)
>> {
>> /* No constraints on the keys, so, return *all* partitions. */
>> if (nkeys == 0)
>> return bms_add_range(result, 0, partdesc->nparts - 1);
>>
>> result = get_partitions_for_keys(relation, &keys);
>> }

Agreed that your suggested rewrite of that portion of the code is easy to
read, but we cannot return yet in the nkeys == 0 case, as you also said
you found out. I quote your other reply further below.

> I'm not sure what is meant to be (formally) done here, but it
> seems to me that OrExpr is assumed to be only at the top level of
> the caluses. So the following (just an example, but meaningful
> expression in this shpape must exists.) expression is perhaps
> wrongly processed here.
>
> CREATE TABLE p (a int) PARITION BY (a);
> CREATE TABLE c1 PARTITION OF p FOR VALUES FROM (0) TO (10);
> CREATE TABLE c2 PARTITION OF p FOR VALUES FROM (10) TO (20);
>
> SELECT * FROM p WHERE a = 15 AND (a = 15 OR a = 5);
>
> get_partitions_for_keys() returns both c1 and c2 and still
> or_clauses here holds (a = 15 OR a = 5) and the function tries to
> *add* partitions for a = 15 and a = 5 separetely.
>
> I'd like to pause here.

[ ... ]

On 2017/11/10 14:44, Kyotaro HORIGUCHI wrote:
> At Fri, 10 Nov 2017 14:38:11 +0900, Kyotaro HORIGUCHI wrote:
>
> This is working fine. Sorry for the bogus comment.

I'd almost started looking around if something might be wrong after all. :)

On 2017/11/10 16:07, Kyotaro HORIGUCHI wrote:
> At Fri, 10 Nov 2017 14:44:55 +0900, Kyotaro HORIGUCHI wrote:
>
>>> Those two conditions are not orthogonal. Maybe something like
>>> following seems more understantable.
>>>
>>>> if (!constfalse)
>>>> {
>>>> /* No constraints on the keys, so, return *all* partitions. */
>>>> if (nkeys == 0)
>>>> return bms_add_range(result, 0, partdesc->nparts - 1);
>>>>
>>>> result = get_partitions_for_keys(relation, &keys);
>>>> }
>
> So, the condition (!constfalse && nkeys == 0) cannot return
> there. I'm badly confused by the variable name.

Do you mean by 'constfalse'?

> I couldn't find another reasonable structure using the current
> classify_p_b_keys(), but could you add a comment like the
> following as an example?

OK, will add comments explaining what's going on.

Will post the updated patches after also taking care of David's and Amul's
review comments upthread.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anthony Bykov 2017-11-13 10:00:28 Re: Jsonb transform for pl/python
Previous Message Jonathan Jacobson 2017-11-13 09:14:01 Re: 10beta1 sequence regression failure on sparc64