Re: [HACKERS] path toward faster partition pruning

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] path toward faster partition pruning
Date: 2017-12-20 08:27:20
Message-ID: 06cde8a5-0ac7-dcf5-ad66-1ca781623e0c@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hello David.

Thanks for the reviews. Replying to all your emails here.

On 2017/12/19 13:36, David Rowley wrote:
> On 12 December 2017 at 22:13, Amit Langote wrote:
>> Attached updated patches.
>
> Hi Amit,
>
> I'm sorry to say this is another micro review per code I'm stumbling
> over when looking at the run-time partition pruning stuff.
>
> 1. In get_partitions_from_clauses_recurse(), since you're assigning
> the result to the first input, the following should use
> bms_add_members and not bms_union. The logical end result is the same,
> but using bms_union means a wasted palloc and a small memory leak
> within the memory context.
>
> /*
> * Partition sets obtained from mutually-disjunctive clauses are
> * combined using set union.
> */
> or_partset = bms_union(or_partset, arg_partset);

Done. Replaced with bms_add_members().

> 2. Also in get_partitions_from_clauses_recurse(), it might also be
> worth putting in a bms_free(or_partset) after:
>
> /*
> * Partition sets obtained from mutually-conjunctive clauses are
> * combined using set intersection.
> */
> result = bms_intersect(result, or_partset);

Done, too.

> Also, instead of using bms_intersect here, would it be better to do:
>
> result = bms_del_members(result, or_partset); ?
>
> That way you don't do a bms_copy and leak member for each OR branch
> since bms_intersect also does a bms_copy()
>
> The resulting set could end up with a few more trailing 0 words than
> what you have now, but it to be a better idea not allocate a new set
> each time.

You meant bms_int_members(), as you also said in your other email.

On 2017/12/19 14:42, David Rowley wrote:
> I was also wondering about your thoughts on the design of
> get_partitions_for_keys() and more generally how there are many
> functions which have some special treatment doing something based on
> ->strategy == PARTITION_STRATEGY_XXX.
>
> If I do:
>
> git grep PARTITION_STRATEGY -- src/backend/catalog/partition.c | wc -l
>
> I get 62 matches, most of which are case statements, and most of the
> remainder are things like if (key->strategy ==
> PARTITION_STRATEGY_HASH).
>
> git grep --show-function PARTITION_STRATEGY -- src/backend/catalog/
> partition.c
>
> shows that get_partitions_for_keys() is probably the most guilty of
> having the most strategy condition tests.

I notice that too now that you mention it.

>
> Also, if we look at get_partitions_for_keys() there's an unconditional:
>
> memset(hash_isnull, false, sizeof(hash_isnull));
>
> which is only used for PARTITION_STRATEGY_HASH, but LIST and RANGE
> must pay the price of that memset.

Although I know you're talking about something else here (about which I
say below), turns out this hash_isnull was completely unnecessary, so I
got rid of it.

> Perhaps it's not expensive enough
> to warrant only doing that when partkey->strategy ==
> PARTITION_STRATEGY_HASH, but it does make me question if we should
> have 3 separate functions for this and just have a case statement to
> call the correct one.
>
> I think if we were to put this off as something we'll fix later, then
> the job would just become harder and harder as time goes on.
>
> It might have been fine when we just had RANGE and LIST partitioning,
> but I think HASH really tips the scales over to this being needed.
>
> What do you think?

I think I somewhat understand your concern with regard to future additions
and maintenance and also now tend to agree.

I tried dividing up get_partitions_for_keys() into one function each for
hash, list, and range and it looks like in the attached. I think I like
the result. Each function has to deal with only query keys and bounds
assuming a given partitioning method and that appears to add to the
overall clarity of the code.

On 2017/12/19 22:44, David Rowley wrote:
> Again, another micro review. I apologise for the slow trickle of
> review. Again, these are just things I'm noticing while reading
> through while thinking of the run-time pruning patch.
>
>
> 1. The following Assert appears to be testing for the presence of
> cosmic rays :-)
>
> /*
> * Determine set of partitions using provided keys, which proceeds in a
> * manner determined by the partitioning method.
> */
> if (keys->n_eqkeys == partkey->partnatts)
> {
> Assert(keys->n_eqkeys == partkey->partnatts);
>
> Perhaps it's misplaced during a rewrite? Should be safe enough to
> remove it, I'd say.

I noticed that too and took care of it. :)

> 2. The following code in classify_partition_bounding_keys() misses
> looking under the RelabelType for rightarg:
>
> leftop = (Expr *) get_leftop(clause);
> if (IsA(leftop, RelabelType))
> leftop = ((RelabelType *) leftop)->arg;
> rightop = (Expr *) get_rightop(clause);
> if (EXPR_MATCHES_PARTKEY(leftop, partattno, partexpr))
> constexpr = rightop;
> else if (EXPR_MATCHES_PARTKEY(rightop, partattno, partexpr))
> constexpr = leftop;
>
> This breaks the following case:
>
> create table thisthat (a varchar not null) partition by list (a);
> create table this partition of thisthat for values in('this');
> create table that partition of thisthat for values in('that');
> explain select * from thisthat where 'this' = a; -- does not work
> QUERY PLAN
> ------------------------------------------------------------
> Append (cost=0.00..54.00 rows=14 width=32)
> -> Seq Scan on that (cost=0.00..27.00 rows=7 width=32)
> Filter: ('this'::text = (a)::text)
> -> Seq Scan on this (cost=0.00..27.00 rows=7 width=32)
> Filter: ('this'::text = (a)::text)
> (5 rows)
>
> explain select * from thisthat where a = 'this'; -- works as we look
> through the RelabelType on left arg.
> QUERY PLAN
> ------------------------------------------------------------
> Append (cost=0.00..27.00 rows=7 width=32)
> -> Seq Scan on this (cost=0.00..27.00 rows=7 width=32)
> Filter: ((a)::text = 'this'::text)

Thanks for pointing it out, fixed.

> 3. The follow code assumes there will be a commutator for the operator:
>
> if (constexpr == rightop)
> pc->op = opclause;
> else
> {
> OpExpr *commuted;
>
> commuted = (OpExpr *) copyObject(opclause);
> commuted->opno = get_commutator(opclause->opno);
> commuted->opfuncid = get_opcode(commuted->opno);
> commuted->args = list_make2(rightop, leftop);
> pc->op = commuted;
> }
>
> I had to hunt for it, but it appears that you're pre-filtering clauses
> with the Consts on the left and no valid commutator in
> match_clauses_to_partkey. I think it's likely worth putting a comment
> to mention that reversed clauses with no commutator should have been
> filtered out beforehand. I'd say it's also worthy of an Assert().

Yeah, added a comment and an Assert.

>
> 4. The spelling of "arbitrary" is incorrect in:
>
> * partattno == 0 refers to arbirtary expressions, which get the

Fixed.

> 5. I've noticed that partition pruning varies slightly from constraint
> exclusion in the following case:
>
> create table ta (a int not null) partition by list (a);
> create table ta1 partition of ta for values in(1,2);
> create table ta2 partition of ta for values in(3,4);
>
> explain select * from ta where a <> 1 and a <> 2; -- partition ta1 is
> not eliminated.
> QUERY PLAN
> -------------------------------------------------------------
> Append (cost=0.00..96.50 rows=5050 width=4)
> -> Seq Scan on ta1 (cost=0.00..48.25 rows=2525 width=4)
> Filter: ((a <> 1) AND (a <> 2))
> -> Seq Scan on ta2 (cost=0.00..48.25 rows=2525 width=4)
> Filter: ((a <> 1) AND (a <> 2))
> (5 rows)
>
>
> alter table ta1 add constraint ta1_chk check (a in(1,2)); -- add a
> check constraint to see if can be removed.
> explain select * from ta where a <> 1 and a <> 2; -- it can.
> QUERY PLAN
> -------------------------------------------------------------
> Append (cost=0.00..48.25 rows=2525 width=4)
> -> Seq Scan on ta2 (cost=0.00..48.25 rows=2525 width=4)
> Filter: ((a <> 1) AND (a <> 2))
> (3 rows)

I see. It seems that the current approach of handling <> operators by
turning clauses containing the same into (key > const OR key < const)
doesn't always work. I think I had noticed that for list partitioning at
least. I will work on alternative way of handling that in the next
version of the patch.

Meanwhile, please find attached patches (v15) that take care of the rest
of the comments. Most of the updates are to patch 0002, compared to the
last (v14) version.

Thanks again for your thoughtful review comments.

Thanks,
Amit

Attachment Content-Type Size
0001-Some-interface-changes-for-partition_bound_-cmp-bsea-v15.patch text/plain 11.6 KB
0002-Introduce-a-get_partitions_from_clauses-v15.patch text/plain 57.3 KB
0003-Move-some-code-of-set_append_rel_size-to-separate-fu-v15.patch text/plain 8.6 KB
0004-More-refactoring-around-partitioned-table-AppendPath-v15.patch text/plain 13.1 KB
0005-Teach-planner-to-use-get_partitions_from_clauses-v15.patch text/plain 40.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-12-20 08:38:51 Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com
Previous Message Fabien COELHO 2017-12-20 07:36:27 Re: General purpose hashing func in pgbench