Re: Parallel Seq Scan

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-01-21 07:17:24
Message-ID: 54BF5284.8020009@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20-01-2015 PM 11:29, Amit Kapila wrote:
>
> I have taken care of integrating the parallel sequence scan with the
> latest patch posted (parallel-mode-v1.patch) by Robert at below
> location:
> http://www.postgresql.org/message-id/CA+TgmoZdUK4K3XHBxc9vM-82khourEZdvQWTfgLhWsd2R2aAGQ@mail.gmail.com
>
> Changes in this version
> -----------------------------------------------
> 1. As mentioned previously, I have exposed one parameter
> ParallelWorkerNumber as used in parallel-mode patch.
> 2. Enabled tuple queue to be used for passing tuples from
> worker backend to master backend along with error queue
> as per suggestion by Robert in the mail above.
> 3. Involved master backend to scan the heap directly when
> tuples are not available in any shared memory tuple queue.
> 4. Introduced 3 new parameters (cpu_tuple_comm_cost,
> parallel_setup_cost, parallel_startup_cost) for deciding the cost
> of parallel plan. Currently, I have kept the default values for
> parallel_setup_cost and parallel_startup_cost as 0.0, as those
> require some experiments.
> 5. Fixed some issues (related to memory increase as reported
> upthread by Thom Brown and general feature issues found during
> test)
>
> Note - I have yet to handle the new node types introduced at some
> of the places and need to verify prepared queries and some other
> things, however I think it will be good if I can get some feedback
> at current stage.
>

I got an assertion failure:

In src/backend/executor/execTuples.c: ExecStoreTuple()

/* passing shouldFree=true for a tuple on a disk page is not sane */
Assert(BufferIsValid(buffer) ? (!shouldFree) : true);

when called from:

In src/backend/executor/nodeParallelSeqscan.c: ParallelSeqNext()

I think something like the following would be necessary (reading from
comments in the code):

--- a/src/backend/executor/nodeParallelSeqscan.c
+++ b/src/backend/executor/nodeParallelSeqscan.c
@@ -85,7 +85,7 @@ ParallelSeqNext(ParallelSeqScanState *node)
if (tuple)
ExecStoreTuple(tuple,
slot,
- scandesc->rs_cbuf,
+ fromheap ? scandesc->rs_cbuf : InvalidBuffer,
!fromheap);

After fixing this, the assertion failure seems to be gone though I
observed the blocked (CPU maxed out) state as reported elsewhere by Thom
Brown.

What I was doing:

CREATE TABLE test(a) AS SELECT generate_series(1, 10000000);

postgres=# SHOW max_worker_processes;
max_worker_processes
----------------------
8
(1 row)

postgres=# SET seq_page_cost TO 100;
SET

postgres=# SET parallel_seqscan_degree TO 4;
SET

postgres=# EXPLAIN SELECT * FROM test;
QUERY PLAN
-------------------------------------------------------------------------
Parallel Seq Scan on test (cost=0.00..1801071.27 rows=8981483 width=4)
Number of Workers: 4
Number of Blocks Per Worker: 8849
(3 rows)

Though, EXPLAIN ANALYZE caused the thing.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-01-21 07:22:31 Re: B-Tree support function number 3 (strxfrm() optimization)
Previous Message Amit Kapila 2015-01-21 07:11:21 Re: parallel mode and parallel contexts