Re: speeding up planning with partitions

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: "Imai, Yoshikazu" <imai(dot)yoshikazu(at)jp(dot)fujitsu(dot)com>, "'Amit Langote'" <amitlangote09(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
Subject: Re: speeding up planning with partitions
Date: 2019-03-14 09:03:54
Message-ID: 256b9c87-b687-6f51-6274-19ae5743aef6@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Imai-san,

On 2019/03/13 19:34, Imai, Yoshikazu wrote:
> I have done code review of v31 patches from 0001 to 0004.
> I described below what I confirmed or thoughts.

Thanks for checking.

> 0001: This seems ok.
>
> 0002:
> * I don't really know that delaying adding resjunk output columns to the plan doesn't affect any process in the planner. From the comments of PlanRowMark, those columns are used in only the executor so I think delaying adding junk Vars wouldn't be harm, is that right?

I think so. The only visible change in behavior is that "rowmark" junk
columns are now placed later in the final plan's targetlist.

> 0003:
> * Is there need to do CreatePartitionDirectory() if rte->inh becomes false?
>
> + else if (rte->rtekind == RTE_RELATION && rte->inh)
> + {
> + rte->inh = has_subclass(rte->relid);
> +
> + /*
> + * While at it, initialize the PartitionDesc infrastructure for
> + * this query, if not done yet.
> + */
> + if (root->glob->partition_directory == NULL)
> + root->glob->partition_directory =
> + CreatePartitionDirectory(CurrentMemoryContext);
> + }

We need to have create "partition directory" before we access a
partitioned table's PartitionDesc for planning. These days, we rely
solely on PartitionDesc to determine a partitioned table's children. So,
we need to see the PartitionDesc before we can tell there are zero
children and set inh to false. In other words, we need the "partition
directory" to have been set up in advance.

> 0004:
> * In addition to commit message, this patch also changes the method of adjusting AppendRelInfos which reference to the subquery RTEs, and new one seems logically correct.

Do you mean I should mention it in the patch's commit message?

> * In inheritance_planner(), we do ChangeVarNodes() only for orig_rtable, so the codes concatenating lists of append_rel_list may be able to be moved before doing ChangeVarNodes() and then the codes concatenating lists of rowmarks, rtable and append_rel_list can be in one block (or one bunch).

Yeah, perhaps. I'll check.

On 2019/03/14 17:35, Imai, Yoshikazu wrote:> Amit-san,
> I have done code review of v31 patches from 0004 to 0008.
>
> 0004:
> * s/childern/children

Will fix.

> 0005:
> * This seems reasonable for not using a lot of memory in specific case,
> although it needs special looking of planner experts.

Certainly.

> 0006:
> * The codes initializing/setting RelOptInfo's part_rels looks like a bit
> complicated, but I didn't come up with any good design so far.

I guess you mean in add_appendrel_other_rels(). The other option is not
have that code there and expand partitioned tables freshly for every
target child, which we got rid of in patch 0004. But we don't want to do
that.

> 0007:
> * This changes some processes using "for loop" to using
> "while(bms_next_member())" which speeds up processing when we scan few
> partitions in one statement, but when we scan a lot of partitions in one
> statement, its performance will likely degraded. I measured the
> performance of both cases.
> I executed select statement to the table which has 4096 partitions.
>
> [scanning 1 partition]
> Without 0007 : 3,450 TPS
> With 0007 : 3,723 TPS
>
> [scanning 4096 partitions]
> Without 0007 : 10.8 TPS
> With 0007 : 10.5 TPS
>
> In the above result, performance degrades 3% in case of scanning 4096
> partitions compared before and after applying 0007 patch. I think when
> scanning a lot of tables, executor time would be also longer, so the
> increasement of planner time would be relatively smaller than it. So we
> might not have to care this performance degradation.

Well, live_parts bitmapset is optimized for queries scanning only few
partitions. It's true that it's slightly more expensive than a simple for
loop over part_rels, but not too much as you're also saying.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-03-14 09:04:11 Re: speeding up planning with partitions
Previous Message Kyotaro HORIGUCHI 2019-03-14 08:58:19 Re: why doesn't DestroyPartitionDirectory hash_destroy?