RE: speeding up planning with partitions

From: "Imai, Yoshikazu" <imai(dot)yoshikazu(at)jp(dot)fujitsu(dot)com>
To: 'Amit Langote' <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, 'Amit Langote' <amitlangote09(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, "Jesper Pedersen" <jesper(dot)pedersen(at)redhat(dot)com>
Subject: RE: speeding up planning with partitions
Date: 2019-03-15 00:33:42
Message-ID: 0F97FA9ABBDBE54F91744A9B37151A5129E604@g01jpexmbkw24
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Amit-san,

On Thu, Mar 14, 2019 at 9:04 AM, Amit Langote wrote:
> > 0002:
> > * I don't really know that delaying adding resjunk output columns to
> the plan doesn't affect any process in the planner. From the comments
> of PlanRowMark, those columns are used in only the executor so I think
> delaying adding junk Vars wouldn't be harm, is that right?
>
> I think so. The only visible change in behavior is that "rowmark" junk
> columns are now placed later in the final plan's targetlist.

ok, thanks.

> > 0003:
> > * Is there need to do CreatePartitionDirectory() if rte->inh becomes
> false?
> >
> > + else if (rte->rtekind == RTE_RELATION && rte->inh)
> > + {
> > + rte->inh = has_subclass(rte->relid);
> > +
> > + /*
> > + * While at it, initialize the PartitionDesc
> infrastructure for
> > + * this query, if not done yet.
> > + */
> > + if (root->glob->partition_directory == NULL)
> > + root->glob->partition_directory =
> > +
> CreatePartitionDirectory(CurrentMemoryContext);
> > + }
>
> We need to have create "partition directory" before we access a
> partitioned table's PartitionDesc for planning. These days, we rely
> solely on PartitionDesc to determine a partitioned table's children. So,
> we need to see the PartitionDesc before we can tell there are zero children
> and set inh to false. In other words, we need the "partition directory"
> to have been set up in advance.

Ah, I see.

> > 0004:
> > * In addition to commit message, this patch also changes the method
> of adjusting AppendRelInfos which reference to the subquery RTEs, and
> new one seems logically correct.
>
> Do you mean I should mention it in the patch's commit message?

Actually I firstly thought that it's better to mention it in the patch's commit message, but I didn't mean that here. I only wanted to state that I confirmed new method of adjusting AppendRelInfos seems logically correct :)

> > * In inheritance_planner(), we do ChangeVarNodes() only for orig_rtable,
> so the codes concatenating lists of append_rel_list may be able to be
> moved before doing ChangeVarNodes() and then the codes concatenating
> lists of rowmarks, rtable and append_rel_list can be in one block (or
> one bunch).
>
> Yeah, perhaps. I'll check.
>
> On 2019/03/14 17:35, Imai, Yoshikazu wrote:> Amit-san,
> > I have done code review of v31 patches from 0004 to 0008.
> >
> > 0004:
> > * s/childern/children
>
> Will fix.
>
> > 0005:
> > * This seems reasonable for not using a lot of memory in specific
> > case, although it needs special looking of planner experts.
>
> Certainly.

Thanks for these.

> > 0006:
> > * The codes initializing/setting RelOptInfo's part_rels looks like a
> > bit complicated, but I didn't come up with any good design so far.
>
> I guess you mean in add_appendrel_other_rels(). The other option is not
> have that code there and expand partitioned tables freshly for every
> target child, which we got rid of in patch 0004. But we don't want to
> do that.

Yeah, that's right.

> > 0007:
> > * This changes some processes using "for loop" to using
> > "while(bms_next_member())" which speeds up processing when we scan few
> > partitions in one statement, but when we scan a lot of partitions in
> > one statement, its performance will likely degraded. I measured the
> > performance of both cases.
> > I executed select statement to the table which has 4096 partitions.
> >
> > [scanning 1 partition]
> > Without 0007 : 3,450 TPS
> > With 0007 : 3,723 TPS
> >
> > [scanning 4096 partitions]
> > Without 0007 : 10.8 TPS
> > With 0007 : 10.5 TPS
> >
> > In the above result, performance degrades 3% in case of scanning 4096
> > partitions compared before and after applying 0007 patch. I think when
> > scanning a lot of tables, executor time would be also longer, so the
> > increasement of planner time would be relatively smaller than it. So
> > we might not have to care this performance degradation.
>
> Well, live_parts bitmapset is optimized for queries scanning only few
> partitions. It's true that it's slightly more expensive than a simple
> for loop over part_rels, but not too much as you're also saying.

Thanks for the comments.

--
Yoshikazu Imai

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-03-15 00:49:42 Re: Offline enabling/disabling of data checksums
Previous Message Tsunakawa, Takayuki 2019-03-15 00:27:14 RE: Timeout parameters