Re: Parallel Seq Scan

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-03-12 14:46:18
Message-ID: CAA4eK1++YrbjcKU8dph8_ueCQ9GAEXXJsLcVmYmLMh5w+-x3pA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 11, 2015 at 1:01 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Tue, Mar 3, 2015 at 7:47 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> > There is still some work left like integrating with
> > access-parallel-safety patch (use parallelmodeok flag to decide
> > whether parallel path can be generated, Enter/Exit parallel mode is
still
> > done during execution of funnel node).
> >
> > I think these are minor points which can be fixed once we decide
> > on the other major parts of patch. Find modified patch attached with
> > this mail.
>
> This is definitely progress. I do think you need to integrate it with
> the access-parallel-safety patch.

I have tried, but there are couple of failures while applying latest
access-parallel-safety patch, so left it as it is for now.

> Other comments:
>
> - There's not much code left in shmmqam.c. I think that the remaining
> logic should be integrated directly into nodeFunnel.c, with the two
> bools in worker_result_state becoming part of the FunnelState. It
> doesn't make sense to have a separate structure for two booleans and
> 20 lines of code. If you were going to keep this file around, I'd
> complain about its name and its location in the source tree, too, but
> as it is I think we can just get rid of it altogether.
>

Agreed. Moved the code/logic to nodeFunnel.c

> - Something is deeply wrong with the separation of concerns between
> nodeFunnel.c and nodePartialSeqscan.c. nodeFunnel.c should work
> correctly with *any arbitrary plan tree* as its left child, and that
> is clearly not the case right now. shm_getnext() can't just do
> heap_getnext(). Instead, it's got to call ExecProcNode() on its left
> child and let the left child decide what to do about that.

Agreed and made the required changes.

> The logic
> in InitFunnelRelation() belongs in the parallel seq scan node, not the
> funnel.

I think we should retain initialization of parallelcontext in InitFunnel().
Apart from that, I have moved other stuff to partial seq scan node.

> ExecReScanFunnel() cannot be calling heap_parallel_rescan();
> it needs to *not know* that there is a parallel scan under it.

Agreed. I think it is better to be do that as part of partial seq scan
node.

> The
> comment in FunnelRecheck is a copy-and-paste from elsewhere that is
> not applicable to a generic funnel mode.
>

With new changes, this API is not required.

> - The comment in execAmi.c refers to says "Backward scan is not
> suppotted for parallel sequiantel scan". "Sequential" is mis-spelled
> here, but I think you should just nuke the whole comment. The funnel
> node is not, in the long run, just for parallel sequential scan, so
> putting that comment above it is not right. If you want to keep the
> comment, it's got to be more general than that somehow, like "parallel
> nodes do not support backward scans", but I'd just drop it.
>
> - Can we rename create_worker_scan_plannedstmt to
> create_parallel_worker_plannedstmt?
>

Agreed and changed as per suggestion.

> - I *strongly* suggest that, for the first version of this, we remove
> all of the tts_fromheap stuff. Let's make no special provision for
> returning a tuple stored in a tuple queue; instead, just copy it and
> store it in the slot as a pfree-able tuple. That may be slightly less
> efficient, but I think it's totally worth it to avoid the complexity
> of tinkering with the slot mechanism.
>

Sure, removed (tts_fromheap becomes redundant with new changes).

> - InstrAggNode claims that we only need the master's information for
> statistics other than buffer usage and tuple counts, but is that
> really true? The parallel backends can be working on the parallel
> part of the plan while the master is doing something else, so the
> amount of time the *master* spent in a particular node may not be that
> relevant.

Yes, but isn't other nodes also work this way, example join node will
display the accumulated stats for buffer usage, but for timing, it will
just use the time for that node (which automatically includes some
part of execution of child nodes, but it is not direct accumulation)?

> We might need to think carefully about what it makes sense
> to display in the EXPLAIN output in parallel cases.
>

Currently the Explain for parallel scan on relation will display the
Funnel node which contains aggregated stat of all workers and the
number of workers and Partial Seq Scan node containing stats for
the scan done by master backend. Do we want to display something
more?

Current result of Explain statement is as below:
postgres=# explain (analyze,buffers) select c1 from t1 where c1 > 90000;
QUERY PLAN

--------------------------------------------------------------------------------
-------------------------------------------
Funnel on t1 (cost=0.00..43750.44 rows=9905 width=4) (actual
time=1097.236..15
30.416 rows=10000 loops=1)
Filter: (c1 > 90000)
Rows Removed by Filter: 65871
Number of Workers: 2
Buffers: shared hit=96 read=99905
-> Partial Seq Scan on t1 (cost=0.00..101251.01 rows=9905 width=4)
(actual
time=1096.188..1521.810 rows=2342 loops=1)
Filter: (c1 > 90000)
Rows Removed by Filter: 24130
Buffers: shared hit=33 read=26439
Planning time: 0.143 ms
Execution time: 1533.438 ms
(11 rows)

> - The header comment on nodeFunnel.h capitalizes the filename incorrectly.
>

Changed.

One additional change (we need to SetLatch()
in HandleParallelMessageInterrupt)
is done to handle the hang issue reported on parallel-mode thread.
Without this change it is difficult to verify the patch (will remove this
change
once new version of parallel-mode patch containing this change will be
posted).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
parallel_seqscan_v10.patch application/octet-stream 103.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-03-12 14:52:02 recovery_target_action doesn't work for anything but shutdown
Previous Message Tom Lane 2015-03-12 14:15:12 Re: logical column ordering