Re: Parallel Index Scans

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Index Scans
Date: 2016-10-20 03:07:45
Message-ID: CAA4eK1+V9L9Tdv19S17=AqveDiSdZhWqfq8Eypvm3Go0q_EsTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 20, 2016 at 7:39 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Mon, Oct 17, 2016 at 8:08 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> Create Index .... With (parallel_workers = 4);
>>
>> If above syntax looks sensible, then we might need to think what
>> should be used for parallel index build. It seems to me that parallel
>> tuple sort patch [1] proposed by Peter G. is using above syntax for
>> getting the parallel workers input from user for parallel index
>> builds.
>
> Apparently you see a similar issue with other major database systems,
> where similar storage parameter things are kind of "overloaded" like
> this (they are used by both index creation, and by the optimizer in
> considering whether it should use a parallel index scan). That can be
> a kind of a gotcha for their users, but maybe it's still worth it.
>

I have also checked and found that you are right. In SQL Server, they
are using max degree of parallelism (MAXDOP) parameter which is I
think is common for all the sql statements.

> In
> any case, the complaints I saw about that were from users who used
> parallel CREATE INDEX with the equivalent of my parallel_workers index
> storage parameter, and then unexpectedly found this also forced the
> use of parallel index scan. Not the other way around.
>

I can understand that it can be confusing to users, so other option
could be to provide separate parameters like parallel_workers_build
and parallel_workers where first can be used for index build and
second can be used for scan. My personal opinion is to have one
parameter, so that users have one less thing to learn about
parallelism.

> Ideally, the parallel_workers storage parameter will rarely be
> necessary because the optimizer will generally do the right thing in
> all case.
>

Yeah, we can choose not to provide any parameter for parallel index
scans, but some users might want to have a parameter similar to
parallel table scans, so it could be handy for them to use.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2016-10-20 03:24:53 Re: Parallel Index Scans
Previous Message Michael Paquier 2016-10-20 03:04:14 Re: Disable autovacuum guc?