Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning
Date: 2018-01-16 08:08:50
Message-ID: 294e8400-6c40-87d5-078f-8713142032a0@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi David.

Thanks for the review.

On 2018/01/12 12:30, David Rowley wrote:
> I've got a few more things for you. I'm only partway through another
> pass, but it makes sense to post what I have now if you're working on
> a new version.
>
> 1. partitioing -> partitioning
>
> * Strategy of a partition clause operator per the partitioing operator class

Fixed.

> 2. get_partitions_from_clauses() modifies partclauses without
> mentioning it in the header. I think you need to either:
>
> a) warn about this in the header comment; or
> b) do a list_copy() before list_concat()
> c) do list_truncate back to the original length after you're done with the list.

Went with (b).

> 3. get_partitions_from_clauses_recurse(), with:
>
> result = bms_add_range(result, 0, partdesc->nparts - 1);
>
> You could change that to bms_add_range(NULL, ...) and ditch the
> assignment of result to NULL at the start of the function.

Done.

> 4. classify_partition_bounding_keys() now returns bool, but the return
> statement is still:
>
> return keys->n_eqkeys + keys->n_minkeys + keys->n_maxkeys + n_keynullness;
>
> my compiler didn't warn about that, but I'd imagine some might.

Oops, my bad.

>
> Instead, can you make it:
>
> if (keys->n_eqkeys > 0 || keys->n_minkeys > 0 ||
> keys->n_maxkeys > 0 || n_keynullness > 0)
> return true;
>
> return false;
>
> probably equal keys are the most likely case, so it'll be good to
> short circuit instead of performing addition on a bunch of stuff we
> don't care about anymore.

Changed it to what Robert suggested downthread.

> 5. In classify_partition_bounding_keys, why do we "continue" here?
>
> clause = rinfo->clause;
> if (rinfo->pseudoconstant &&
> !DatumGetBool(((Const *) clause)->constvalue))
> {
> *constfalse = true;
> continue;
> }
>
> Is there any point in searching further?
>
> Also, if you were consistent with the return value for
> classify_partition_bounding_keys when you've set *constfalse = true;
> you wouldn't need to handle the case twice like you are in
> get_partitions_from_clauses_recurse().

OK, I made classify_partition_bounding_keys() return true whenever set
*constfalse to true.

> 6. I think it would be nicer if get_partitions_from_ne_clauses returns
> a set of partitions that could be excluded.
>
> So instead of:
>
> * get_partitions_from_ne_clauses
> *
> * Return partitions of relation that satisfy all <> operator clauses in
> * ne_clauses. Only ever called if relation is a list partitioned table.
>
> Have:
>
> * get_partitions_from_ne_clauses
> *
> * Returns a Bitmapset of partitions that can be safely excluded due to
> * not-equal clauses existing for all possible partition values. It is only
> * valid to call this for LIST partitioned tables.
>
> and instead of:
>
> result = bms_add_range(NULL, 0, partdesc->nparts - 1);
> result = bms_del_members(result, excluded_parts);
> bms_free(excluded_parts);
>
> return result;
>
> Just do:
>
> return excluded_parts;
>
> and in get_partitions_from_clauses_recurse(), do bms_del_members
> instead of bms_int_members.
>
> there's less bit shuffling and it seems cleaner. Perhaps the function
> name would need to be changed if we're inverting the meaning too.
>
> (I've attached a patch which makes this change along with an idea in #8 below)

Thanks for the suggestions... (comment continues below)

> 7. The following comment claims the function sets *datum, but there's
> no param by that name:
>
> /*
> * partkey_datum_from_expr
> * Extract constant value from expr and set *datum to that value
> */
> static bool
> partkey_datum_from_expr(PartitionKey key, int partkeyidx,
> Expr *expr, Datum *value)

Fixed.

> 8. The code in get_partitions_from_ne_clauses() does perform quite a
> few nested loops. I think a more simple way to would be to track the
> offsets you've seen in a Bitmapset. This would save you having to
> check for duplicates, as an offset can only contain a single datum.
> You'd just need to build a couple of arrays after that, one to sum up
> the offsets found per partition, and one for the total datums allowed
> in the partition. If the numbers match then you can remove the
> partition.
>
> I've written this and attached it to this email. It saves about 50
> lines of code and should perform much better for complex cases, for
> example, a large NOT IN list. This also implements #6.

I liked your patch, so incorporated it, except, I feel slightly
uncomfortable about the new name that you chose for the function because
it sounds a bit generic. I mean the function solves a very specific
problem and have very strict requirements for calling it. It's not like
we could pass it just any partitioned relation and/or just any set of
clauses. It has to be a list-partitioned table and the list of clauses
must contain only the clauses containing compatible <> operators. Checks
for those requirements are carried out in yet another place, that is,
classify_partition_bounding_keys().

Perhaps we can live with that though, because it's not a publicly
available function, but someone might get confused in the future.

> 9. "the same" -> "it"
>
> /*
> * In case of NOT IN (..), we get a '<>', which while not
> * listed as part of any operator family, we are able to
> * handle the same if its negator is indeed a part of the
> * partitioning operator family.
> */

Done.

> 10. in classify_partition_bounding_keys: "0" -> "false"
>
> /* Return if no work to do below. */
> if (!will_compute_keys || *constfalse)
> return 0;
>
> Likewise for:
>
> if (*constfalse)
> return 0;

Have fixed these per an earlier comment in this email.

> 11. I don't see partition_bound_bsearch used anywhere below the
> following comment:
>
> * Generate bounding tuple(s).
> *
> * We look up partitions in the partition bound descriptor using, say,
> * partition_bound_bsearch(), which expects a Datum (or Datums if multi-
> * column key). So, extract the same out of the constant argument of
> * each clause.
>
> I also don't know what the comment is trying to say.

The comment is no longer very intelligible to me too. I just wanted to
say here that, *elsewhere*, we will use a function like
partition_bound_bsearch() to look up partitions from the clauses we
matched against individual partition key columns. That function expects
the lookup key to be in a Datum array form, not a list-of-clauses form.
So, we must construct the lookup key(s) by extracting constant values out
the clauses.

I tried to rewrite it that way. Hope that's a bit clearer.

> 12.
>
> * operator and sets *incl if equality is implied
>
> should be:
>
> * operator and set *incl to true if the operator's strategy is inclusive.

Done that way.

> 13. What does "the same" mean in:
>
> * and add this one directly to the result. Caller would
> * arbitrarily choose one of the many and perform
> * partition-pruning with the same. It's possible that mutual

It means "the one that caller would arbitrarily choose of the many that
this function will return to it". Anyway, I changed "the same" to "it".

> I think you quite often use "the same" to mean "it". Can you change that?

I guess that's just one of my many odd habits when writing English, all of
which I'm trying to get rid of, but apparently with limited success. Must
try harder. :)

> 14. Not sure what parameter you're talking about here.
>
> * Evaluate 'leftarg op rightarg' and set *result to its value.
> *
> * leftarg and rightarg referred to above actually refer to the constant
> * operand (Datum) of the clause contained in the parameters leftarg and
> * rightarg below, respectively. And op refers to the operator of the
> * clause contained in the parameter op below.

I rewrote the above comment block as:

* Try to compare the constant arguments of 'leftarg' and 'rightarg', in that
* order, using the operator of 'op' and set *result to the result of this
* comparison.

Is that any better?

> 15. "the latter" is normally used when you're referring to the last
> thing in a list which was just mentioned. In this case, leftarg_const
> and rightarg_const is the list, so "the latter" should mean
> rightarg_const, but I think you mean to compare them using the
> operator.
>
> * If the leftarg_const and rightarg_const are both of the type expected
> * by op's operator, then compare them using the latter.

Rewrote it as:

* We can compare leftarg_const and rightarg_const using op's operator
* only if both are of the type expected by it.

> 16. There are a few things to improve with the following comment:
>
> /*
> * Hash partitioning stores partition keys containing nulls in regular
> * partitions. That is, the code that determines the hash partition for
> * a given row admits nulls in the partition key when computing the key's
> * hash. So, here we treat any IS NULL clauses on partition key columns as
> * equality keys, along with any other non-null values coming from equality
> * operator clauses.
> */
>
> "admits" is not the correct word here, and "hash" should be "correct",
> but there are more mistakes, so might be easier just to rewrite to:
>
> /*
> * Since tuples with NULL values in the partition key columns are
> stored in regular partitions,
> * we'll treat any IS NULL clauses here as regular equality clauses.
> /*

Agreed that your version is better, so went with it.

> 17. The following example will cause get_partitions_for_keys_hash to misbehave:
>
> create table hashp (a int, b int) partition by hash (a, b);
> create table hashp1 partition of hashp for values with (modulus 4, remainder 0);
> create table hashp2 partition of hashp for values with (modulus 4, remainder 1);
> create table hashp3 partition of hashp for values with (modulus 4, remainder 3);
> create table hashp4 partition of hashp for values with (modulus 4, remainder 2);
> explain select * from hashp where a = 1 and a is null;
>
> The following code assumes that you'll never get a NULL test for a key
> that has an equality test, and ends up trying to prune partitions
> thinking we got compatible clauses for both partition keys.

Yeah, I think this example helps spot a problem. I thought we'd never get
to get_partitions_for_keys_hash() for the above query, because we
should've been able to prove much earlier that the particular clause
combination should be always false (a cannot be both non-null 1 and null).
Now, because the planner itself doesn't substitute a constant-false for
that, I taught classify_partition_bounding_keys() to do so. It would now
return constfalse=true if it turns out that clauses in the input list lead
to contradictory nullness condition for a given partition column.

> memset(keyisnull, false, sizeof(keyisnull));
> for (i = 0; i < partkey->partnatts; i++)
> {
> if (bms_is_member(i, keys->keyisnull))
> {
> keys->n_eqkeys++;
> keyisnull[i] = true;
> }
> }
>
> /*
> * Can only do pruning if we know all the keys and they're all equality
> * keys including the nulls that we just counted above.
> */
> if (keys->n_eqkeys == partkey->partnatts)
>
> The above code will need to be made smarter. It'll likely crash if you
> change "b" to a pass-by-ref type.

Hmm, not sure why. It seems to work:

create table hp (a int, b text) partition by hash (a, b);
create table hp1 partition of hp for values with (modulus 4, remainder 0);
create table hp2 partition of hp for values with (modulus 4, remainder 1);
create table hp3 partition of hp for values with (modulus 4, remainder 3);
create table hp4 partition of hp for values with (modulus 4, remainder 2);

insert into hp values (1, 'xxx');
INSERT 0 1

select tableoid::regclass, * from hp;
tableoid | a | b
----------+---+-----
hp1 | 1 | xxx
(1 row)

insert into hp (a) values (1);
INSERT 0 1

insert into hp (b) values ('xxx');
INSERT 0 1

select tableoid::regclass, * from hp where a is null;
tableoid | a | b
----------+---+-----
hp2 | | xxx
(1 row)

select tableoid::regclass, * from hp where b is null;
tableoid | a | b
----------+---+---
hp1 | 1 |
(1 row)

select tableoid::regclass, * from hp where a = 1 and b is null;
tableoid | a | b
----------+---+---
hp1 | 1 |
(1 row)

select tableoid::regclass, * from hp where a is null and b = 'xxx';
tableoid | a | b
----------+---+-----
hp2 | | xxx
(1 row)

> 18. The following code:
>
> int other_idx = -1;
>
> /*
> * Only a designated partition accepts nulls, which if there
> * exists one, return the same.
> */
> if (partition_bound_accepts_nulls(boundinfo) ||
> partition_bound_has_default(boundinfo))
> other_idx = partition_bound_accepts_nulls(boundinfo)
> ? boundinfo->null_index
> : boundinfo->default_index;
> if (other_idx >= 0)
> return bms_make_singleton(other_idx);
> else
> return NULL;
>
> should be simplified to:
>
> /*
> * NULLs may only exist in the NULL partition, or in the
> * default, if there's no NULL partition.
> */
> if (partition_bound_accepts_nulls(boundinfo))
> return bms_make_singleton(boundinfo->null_index);
> else if (partition_bound_has_default(boundinfo))
> return bms_make_singleton(boundinfo->default_index);
> return NULL;

Agreed, done that way.

> 19. "exists" -> "are"
>
> * If there are no datums to compare keys with, but there exist
> * partitions, it must be the default partition.
>
> also, instead of writing "it must be the default partition." it should
> be better to say "just return the default partition."

OK, done.

> 20. I don't think the return NULL should ever hit, is it worth putting
> a comment to say /* shouldn't happen */
>
> if (boundinfo->ndatums == 0)
> {
> if (partition_bound_has_default(boundinfo))
> return bms_make_singleton(boundinfo->default_index);
> else
> return NULL;
> }

I added a /* shouldn't happen */ comment next to return NULL.

> 21. Can the following comment does not explain the situation well:
>
> /*
> * boundinfo->ndatums - 1 is the last valid list partition datums
> * index.
> */
>
> There's really no possible non-default partition for this case, so
> perhaps we should just return the default, if one exists. We do go on
> to check the n_maxkeys needlessly for this case. At the very least the
> comment should be changed to:
>
> /*
> * minkeys values are greater than any non-default partition.
> * We'll check that for case below.
> */
>
> but I think it's worth just doing the default partition check there
> and returning it, or NULL. It should help reduce confusion.

Yep, done.

Attached v20. Thanks again.

Regards,
Amit

Attachment Content-Type Size
v20-0001-Some-interface-changes-for-partition_bound_-cmp-.patch text/plain 11.7 KB
v20-0002-Introduce-a-get_partitions_from_clauses.patch text/plain 65.0 KB
v20-0003-Move-some-code-of-set_append_rel_size-to-separat.patch text/plain 8.6 KB
v20-0004-More-refactoring-around-partitioned-table-Append.patch text/plain 12.7 KB
v20-0005-Teach-planner-to-use-get_partitions_from_clauses.patch text/plain 44.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message REIX, Tony 2018-01-16 08:25:51 RE:[HACKERS] Deadlock in XLogInsert at AIX
Previous Message Aleksandr Parfenov 2018-01-16 07:52:26 Re: PostgreSQL crashes with SIGSEGV