Re: [DESIGN] ParallelAppend

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: [DESIGN] ParallelAppend
Date: 2015-07-28 08:22:48
Message-ID: CAFjFpRcjiQcqEPtNjm84=9cFrsXF3LahGUP1vseF3_9yXWDLTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 28, 2015 at 12:59 PM, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com
> wrote:

>
> On 27 July 2015 at 21:09, Kyotaro HORIGUCHI <
> horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
>> Hello, can I ask some questions?
>>
>> I suppose we can take this as the analog of ParalleSeqScan. I
>> can see not so distinction between Append(ParalleSeqScan) and
>> ParallelAppend(SeqScan). What difference is there between them?
>>
>> If other nodes will have the same functionality as you mention at
>> the last of this proposal, it might be better that some part of
>> this feature is implemented as a part of existing executor
>> itself, but not as a deidicated additional node, just as my
>> asynchronous fdw execution patch patially does. (Although it
>> lacks planner part and bg worker launching..) If that is the
>> case, it might be better that ExecProcNode is modified so that it
>> supports both in-process and inter-bgworker cases by the single
>> API.
>>
>> What do you think about this?
>>
>
> I have to say that I really like the thought of us having parallel enabled
> stuff in Postgres, but I also have to say that I don't think inventing all
> these special parallel node types is a good idea. If we think about
> everything that we can parallelise...
>
> Perhaps.... sort, hash join, seqscan, hash, bitmap heap scan, nested loop.
> I don't want to debate that, but perhaps there's more, perhaps less.
> Are we really going to duplicate all of the code and add in the parallel
> stuff as new node types?
>
> My other concern here is that I seldom hear people talk about the
> planner's architectural lack of ability to make a good choice about how
> many parallel workers to choose. Surely to properly calculate costs you
> need to know the exact number of parallel workers that will be available at
> execution time, but you need to know this at planning time!? I can't see
> how this works, apart from just being very conservative about parallel
> workers, which I think is really bad, as many databases have busy times in
> the day, and also quiet times, generally quiet time is when large batch
> stuff gets done, and that's the time that parallel stuff is likely most
> useful. Remember queries are not always planned just before they're
> executed. We could have a PREPAREd query, or we could have better plan
> caching in the future, or if we build some intelligence into the planner to
> choose a good number of workers based on the current server load, then
> what's to say that the server will be under this load at exec time? If we
> plan during a quiet time, and exec in a busy time all hell may break loose.
>
> I really do think that existing nodes should just be initialized in a
> parallel mode, and each node type can have a function to state if it
> supports parallelism or not.
>
> I'd really like to hear more opinions in the ideas I discussed here:
>
>
> http://www.postgresql.org/message-id/CAApHDvp2STf0=pQfpq+e7WA4QdYmpFM5qu_YtUpE7R0jLnH82Q@mail.gmail.com
>
>
> This design makes use of the Funnel node that Amit has already made and
> allows more than 1 node to be executed in parallel at once.
>
> It appears that parallel enabling the executor node by node is
> fundamentally locked into just 1 node being executed in parallel, then
> perhaps a Funnel node gathering up the parallel worker buffers and
> streaming those back in serial mode. I believe by design, this does not
> permit a whole plan branch from executing in parallel and I really feel
> like doing things this way is going to be very hard to undo and improve
> later. I might be too stupid to figure it out, but how would parallel hash
> join work if it can't gather tuples from the inner and outer nodes in
> parallel?
>
> Sorry for the rant, but I just feel like we're painting ourselves into a
> corner by parallel enabling the executor node by node.
> Apologies if I've completely misunderstood things.
>
>
+1, well articulated.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Marc Mamin 2015-07-28 08:43:37 Re: proposal: multiple psql option -c
Previous Message Dean Rasheed 2015-07-28 07:32:02 Re: A little RLS oversight?