Re: Parallel Seq Scan

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-07-20 08:06:32
Message-ID: CAJrrPGf_8KnCeCO_vfJ8zuKeX_CSekH7Bj-cZxZQ3d2fYLvpdA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 20, 2015 at 3:31 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Fri, Jul 17, 2015 at 1:22 PM, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
> wrote:
>>
>> On Thu, Jul 16, 2015 at 1:10 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>> > Thanks, I will fix this in next version of patch.
>> >
>>
>> I am posting in this thread as I am not sure, whether it needs a
>> separate thread or not?
>>
>> I gone through the code and found that the newly added funnel node is
>> is tightly coupled with
>> partial seq scan, in order to add many more parallel plans along with
>> parallel seq scan,
>> we need to remove the integration of this node with partial seq scan.
>>
>
> This assumption is wrong, Funnel node can execute any node beneath
> it (Refer ExecFunnel->funnel_getnext->ExecProcNode, similarly you
> can see exec_parallel_stmt).

Yes, funnel node can execute any node beneath it. But during the planning
phase, the funnel path is added on top of partial scan path. I just want the
same to enhanced to support other parallel nodes.

> Yes, currently nodes supported under
> Funnel nodes are limited like partialseqscan, result (due to reasons
> mentioned upthread like readfuncs.s doesn't have support to read Plan
> nodes which is required for worker backend to read the PlannedStmt,
> ofcourse we can add them, but as we are supportting parallelism for
> limited nodes, so I have not enhanced the readfuncs.c) but in general
> the basic infrastructure is designed such a way that it can support
> other nodes beneath it.
>
>> To achieve the same, I have the following ideas.
>>
>>
>> Execution:
>> The funnel execution varies based on the below plan node.
>> 1) partial scan - Funnel does the local scan also and returns the tuples
>> 2) partial agg - Funnel does the merging of aggregate results and
>> returns the final result.
>>
>
> Basically Funnel will execute any node beneath it, the Funnel node itself
> is not responsible for doing local scan or any form of consolidation of
> results, as of now, it has these 3 basic properties
> – Has one child, runs multiple copies in parallel.
> – Combines the results into a single tuple stream.
> – Can run the child itself if no workers available.

+ if (!funnelstate->local_scan_done)
+ {
+ outerPlan = outerPlanState(funnelstate);
+
+ outerTupleSlot = ExecProcNode(outerPlan);

From the above code in funnel_getnext function, it directly does the
calls the below
node to do the scan in the backend side also. This code should refer the below
node type, based on that only it can go for the backend scan.

I feel executing outer plan always may not be correct for other parallel nodes.

>> Any other better ideas to achieve the same?
>>
>
> Refer slides 16-19 in Parallel Sequential Scan presentation in PGCon
> https://www.pgcon.org/2015/schedule/events/785.en.html

Thanks for the information.

> I don't have very clear idea what is the best way to transform the nodes
> in optimizer, but I think we can figure that out later unless majority
> people
> see that as blocking factor.

I am also not finding it as a blocking factor for parallel scan.
I written the above mail to get some feedback/suggestions from hackers on
how to proceed in adding other parallelism nodes along with parallel scan.

Regards,
Hari Babu
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2015-07-20 08:14:05 All-zero page in GIN index causes assertion failure
Previous Message Simon Riggs 2015-07-20 08:05:19 Re: Support for N synchronous standby servers - take 2