Re: [HACKERS] Runtime Partition Pruning

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, amul sul <sulamul(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: [HACKERS] Runtime Partition Pruning
Date: 2020-10-12 02:28:48
Message-ID: CAKU4AWoqn2Fej7D_2ZCJKqdbOmXiizu3q1xQk5Qz=uAikck0uA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Ashutosh:

On Thu, Oct 8, 2020 at 7:25 PM Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
wrote:

> On Wed, Oct 7, 2020 at 7:00 PM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
> >
> >
> >
> > On Wed, Oct 7, 2020 at 5:05 PM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
> wrote:
> >>
> >>
> >>
> >> On Sun, Oct 4, 2020 at 3:10 PM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
> wrote:
> >>>>
> >>>>
> >>>>
> >>>> Now, in my experience, the current system for custom plans vs. generic
> >>>> plans doesn't approach the problem in this way at all, and in my
> >>>> experience that results in some pretty terrible behavior. It will do
> >>>> things like form a custom plan every time because the estimated cost
> >>>> of the custom plan is lower than the estimated cost of the generic
> >>>> plan even though the two plans are structurally identical; only the
> >>>> estimates differ. It will waste gobs of CPU cycles by replanning a
> >>>> primary key lookup 5 times just on the off chance that a lookup on the
> >>>> primary key index isn't the best option. But this patch isn't going
> >>>> to fix any of that. The best we can probably do is try to adjust the
> >>>> costing for Append paths in some way that reflects the costs and
> >>>> benefits of pruning. I'm tentatively in favor of trying to do
> >>>> something modest in that area, but I don't have a detailed proposal.
> >>>>
> >>>
> >>> I just realized this issue recently and reported it at [1], then Amit
> pointed
> >>> me to this issue being discussed here, so I would like to continue
> this topic
> >>> here.
> >>>
> >>> I think we can split the issue into 2 issues. One is the partition
> prune in initial
> >>> partition prune, which maybe happen in custom plan case only and caused
> >>> the above issue. The other one happens in the "Run-Time" partition
> prune,
> >>> I admit that is an important issue to resolve as well, but looks
> harder. So I
> >>> think we can fix the first one at first.
> >>>
> >>> ... When we count for the cost of a
> >>> generic plan, we can reduce the cost based on such information.
> >>
> >>
> >> This way doesn't work since after the initial partition prune, not
> only the
> >> cost of the Append node should be reduced, the cost of other plans
> should
> >> be reduced as well [1]
> >>
> >> However I think if we can use partition prune information from a custom
> plan
> >> at the cost_append_path stage, it looks the issue can be fixed. If
> so, the idea
> >> is similar to David's idea in [2], however Robert didn't agree with
> this[2].
> >> Can anyone elaborate this objection? for a partkey > $1 or BETWEEN
> cases,
> >> some real results from the past are probably better than some hard-coded
> >> assumptions IMO.
> >
> >
> > I can understand Robert's idea now, he intended to resolve both the
> > "initial-partition-prune" case and "runtime partition prune" case while
> I always think
> > about the former case (Amit reminded me about that at the beginning, but
> I just
> > wake up right now..)
> >
> > With the Robert's idea, I think we can do some hack at
> create_append_path,
> > we can figure out the pruneinfo (it was done at create_append_plan now)
> at that
> > stage, and then did a fix_append_path with such pruneinfo. We need to
> fix not
> > only the cost of AppendPath, but also rows of AppendPath, For example:
> > SELECT .. FROM t1, t2, p where t1.a = p.partkey and t1.b = t2.b, if the
> > plan is:
> > Nest Loop
> > Nest Loop
> > t1
> > Append (p)
> > t2
> >
> > Then the rows of Append (p) will be very important.
> >
> > For this idea, how to estimate how many partitions/rows can be pruned
> is a key
> > part. Robert said "I was thinking, rather, that if we know for example
> that
> > we've doing pruning on partition_column = $1, then we know that only
> > one partition will match. That's probably a common case. If we've
> > got partition_column > $1, we could assume that, say, 75% of the
> > partitions would match. partition_column BETWEEN $1 and $2 is
> > probably a bit more selective, so maybe we assume 50% of the
> > partitions would match.". I think we can't say the 75% or 50% is a good
> > number, but the keypoint may be "partition_column = $1" is a common
> > case. And for the others case, we probably don't make it worse.
> >
> > I think we need to do something here, any thoughts? Personally I'm more
> > like this idea now.
>
> Yes, I think we have to do something on those lines. But I am
> wondering why our stats machinery is failing to do that already. For
> equality I think it's straight forward. But even for other operators
> we should use the statistics from the partitioned table to estimate
> the selectivity and then adjust number of scanned partitions = (number
> of partitions * fraction of rows scanned). That might be better than
> 50% or 75%.
>
>
Sorry for the late reply! Suppose we have partition defined like this:
p
- p1
- p2

When you talk about "the statistics from the partitioned table", do you
mean the statistics from p or p1/p2? I just confirmed there is no
statistics
for p (at least pg_class.reltuples = -1), so I think you are talking about
p1/p2.

Here we are talking about partkey = $1 or partkey = RunTimeValue.
so even the value can hit 1 partition only, but since we don't know
the exact value, so we treat all the partition equally. so looks
nothing wrong with partition level estimation. However when we cost
the Append path, we need know just one of them can be hit, then
we need do something there. Both AppendPath->rows/total_cost
should be adjusted (That doesn't happen now).

--
Best Regards
Andy Fan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2020-10-12 02:32:39 Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Previous Message tsunakawa.takay@fujitsu.com 2020-10-12 02:07:54 RE: Transactions involving multiple postgres foreign servers, take 2