Re: Parallel Seq Scan

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Thom Brown <thom(at)linux(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-03-26 02:59:09
Message-ID: CAA4eK1KU3ryj-VPtB4CcoyqHHjFWBvBo6H+nhZORMGSsVLTPRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 25, 2015 at 9:53 PM, Thom Brown <thom(at)linux(dot)com> wrote:
>
> On 25 March 2015 at 15:49, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>> On Wed, Mar 25, 2015 at 5:16 PM, Thom Brown <thom(at)linux(dot)com> wrote:
>> > Okay, with my pgbench_accounts partitioned into 300, I ran:
>> >
>> > SELECT DISTINCT bid FROM pgbench_accounts;
>> >
>> > The query never returns,
>>
>> You seem to be hitting the issue I have pointed in near-by thread [1]
>> and I have mentioned the same while replying on assess-parallel-safety
>> thread. Can you check after applying the patch in mail [1]
>
>
> Ah, okay, here's the patches I've now applied:
>
> parallel-mode-v9.patch
> assess-parallel-safety-v4.patch
> parallel-heap-scan.patch
> parallel_seqscan_v12.patch
> release_lock_dsm_v1.patch
>
> (with perl patch for pg_proc.h)
>
> The query now returns successfully.
>

Thanks for verification.

>> ..
>> >
>> > Still not sure why 8 workers are needed for each partial scan. I
would expect 8 workers to be used for 8 separate scans. Perhaps this is
just my misunderstanding of how this feature works.
>> >
>>
>> The reason is that for each table scan, it tries to use workers
>> equal to parallel_seqscan_degree if they are available and in this
>> case as the scan for inheritance hierarchy (tables in hierarchy) happens
>> one after another, it uses 8 workers for each scan. I think as of now
>> the strategy to decide number of workers to be used in scan is kept
>> simple and in future we can try to come with some better mechanism
>> to decide number of workers.
>
>
> Yes, I was expecting the parallel aspect to apply across partitions (a
worker per partition up to parallel_seqscan_degree and reallocate to
another >scan once finished with current job), not individual ones,
>

Here what you are describing is something like parallel partition
scan which is somewhat related but different feature. This
feature will parallelize the scan for an individual table.

> so for the workers to be above the funnel, not below it. So this is
parallelising, just not in a way that will be a win in this case. :( For
the query I
> posted (SELECT DISTINCT bid FROM pgbench_partitions), the parallelised
version takes 8 times longer to complete.
>

I think the primary reason for it not performing as per expectation is
because we have either not the set the right values for cost
parameters or changed the existing cost parameters (cost_seq_page)
which makes planner to select parallel plan even though it is costly.

This is similar to the behaviour when user has intentionally disabled
index scan to test sequence scan and then telling that it is performing
slower.

I think if you want to help in this direction, then what will be more useful
is to see what could be the appropriate values of cost parameters for
parallel scan. We have introduced 3 parameters (cpu_tuple_comm_cost,
parallel_setup_cost, parallel_startup_cost) for costing of parallel plans,
so
with your tests if we can decide what is the appropriate value for each of
these parameters such that it chooses parallel plan only when it is better
than non-parallel plan, then that will be really valuable input.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shigeru HANADA 2015-03-26 03:04:47 Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Previous Message Tom Lane 2015-03-26 02:16:49 Re: Re: [COMMITTERS] pgsql: btree_gin: properly call DirectFunctionCall1()