From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: why not parallel seq scan for slow functions |
Date: | 2017-07-13 02:08:45 |
Message-ID: | CAA4eK1K3pxVknO5yLjHTi2ciSO9Yabi2yCRnrw5=PTJR-FoQ7g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
>>
>> On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> > On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
>> > wrote:
>> >>
>> >> So because of this high projection cost the seqpath and parallel path
>> >> both have fuzzily same cost but seqpath is winning because it's
>> >> parallel safe.
>> >
>> >
>> > I think you are correct. However, unless parallel_tuple_cost is set
>> > very
>> > low, apply_projection_to_path never gets called with the Gather path as
>> > an
>> > argument. It gets ruled out at some earlier stage, presumably because
>> > it
>> > assumes the projection step cannot make it win if it is already behind
>> > by
>> > enough.
>> >
>>
>> I think that is genuine because tuple communication cost is very high.
>
>
> Sorry, I don't know which you think is genuine, the early pruning or my
> complaint about the early pruning.
>
Early pruning. See, currently, we don't have a way to maintain both
parallel and non-parallel paths till later stage and then decide which
one is better. If we want to maintain both parallel and non-parallel
paths, it can increase planning cost substantially in the case of
joins. Now, surely it can have benefit in many cases, so it is a
worthwhile direction to pursue.
> I agree that the communication cost is high, which is why I don't want to
> have to set parellel_tuple_cost very low. For example, to get the benefit
> of your patch, I have to set parellel_tuple_cost to 0.0049 or less (in my
> real-world case, not the dummy test case I posted, although the number are
> around the same for that one too). But with a setting that low, all kinds
> of other things also start using parallel plans, even if they don't benefit
> from them and are harmed.
>
> I realize we need to do some aggressive pruning to avoid an exponential
> explosion in planning time, but in this case it has some rather unfortunate
> consequences. I wanted to explore it, but I can't figure out where this
> particular pruning is taking place.
>
> By the time we get to planner.c line 1787, current_rel->pathlist already
> does not contain the parallel plan if parellel_tuple_cost >= 0.0050, so the
> pruning is happening earlier than that.
>
Check generate_gather_paths.
>
>>
>> If your table is reasonable large then you might want to try by
>> increasing parallel workers (Alter Table ... Set (parallel_workers =
>> ..))
>
>
>
> Setting parallel_workers to 8 changes the threshold for the parallel to even
> be considered from parellel_tuple_cost <= 0.0049 to <= 0.0076. So it is
> going in the correct direction, but not by enough to matter.
>
You might want to play with cpu_tuple_cost and or seq_page_cost.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Sandeep Thakkar | 2017-07-13 02:33:18 | Re: PostgreSQL10 beta2 with ICU - initdb fails on MacOS |
Previous Message | Dean Rasheed | 2017-07-13 00:32:52 | Re: New partitioning - some feedback |