Re: Parallel Seq Scan

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-01-10 05:22:20
Message-ID: CAA4eK1J7opEd_VcCU=mROeMQo8mWMYC-xMV3Ln13YZODSFfqPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 10, 2015 at 2:45 AM, Stefan Kaltenbrunner
<stefan(at)kaltenbrunner(dot)cc> wrote:
>
> On 01/09/2015 08:01 PM, Stephen Frost wrote:
> > Amit,
> >
> > * Amit Kapila (amit(dot)kapila16(at)gmail(dot)com) wrote:
> >> On Fri, Jan 9, 2015 at 1:02 AM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
wrote:
> >>> I agree, but we should try and warn the user if they set
> >>> parallel_seqscan_degree close to max_worker_processes, or at least
give
> >>> some indication of what's going on. This is something you could end up
> >>> beating your head on wondering why it's not working.
> >>
> >> Yet another way to handle the case when enough workers are not
> >> available is to let user specify the desired minimum percentage of
> >> requested parallel workers with parameter like
> >> PARALLEL_QUERY_MIN_PERCENT. For example, if you specify
> >> 50 for this parameter, then at least 50% of the parallel workers
> >> requested for any parallel operation must be available in order for
> >> the operation to succeed else it will give error. If the value is set
to
> >> null, then all parallel operations will proceed as long as at least two
> >> parallel workers are available for processing.
> >
>>
> > Now, for debugging purposes, I could see such a parameter being
> > available but it should default to 'off/never-fail'.
>
> not sure what it really would be useful for - if I execute a query I
> would truely expect it to get answered - if it can be made faster if
> done in parallel thats nice but why would I want it to fail?
>

One usecase where I could imagine it to be useful is when the
query is going to take many hours if run sequentially and it could
be finished in minutes if run with 16 parallel workers, now let us
say during execution if there are less than 30% of parallel workers
available it might not be acceptable to user and he would like to
rather wait for some time and again run the query and if he wants
to run query even if 2 workers are available, he can choose not
to such a parameter.

Having said that, I also feel this doesn't seem to be an important case
to introduce a new parameter and such a behaviour. I have mentioned,
because it came across my eyes how some other databases handle
such a situation. Lets forget this suggestion if we can't imagine any
use of such a parameter.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2015-01-10 06:06:52 Re: max_connections documentation
Previous Message Amit Kapila 2015-01-10 04:59:02 Re: Parallel Seq Scan