Quick Links

Re: Parallel Seq Scan

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Parallel Seq Scan
Date:	2015-09-24 18:31:17
Message-ID:	CAA4eK1+Lq7TNb3eQ=RuSkrnUK1BomEJ89x5+mBUR4O04f9HQCQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Sep 17, 2015 at 2:28 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Thu, Sep 3, 2015 at 6:21 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:

I have fixed most of the review comments raised in this mail
as well as other e-mails and rebased the patch on commit-
020235a5. Even though I have fixed many of the things, but
still quite a few comments are yet to be handled. This patch
is mainly a rebased version to ease the review. We can continue
to have discussion on the left over things and I will address
those in consecutive patches.

>
> + if (es->analyze && nodeTag(plan) == T_Funnel)
>
> Why not IsA()?
>

Changed as per suggestion.

> + FinishParallelSetupAndAccumStats((FunnelState *)planstate);
>
> Shouldn't there be a space before planstate?
>

Yes, fixed.

> + /* inform executor to collect buffer usage stats from parallel
workers. */
> + estate->total_time = queryDesc->totaltime ? 1 : 0;
>
> Boy, the comment sure doesn't seem to match the code.
>

This is not required as per latest version of patch, so I have removed the
same.

> + * Accumulate the stats by parallel workers before stopping the
> + * node.
>
> Suggest: "Accumulate stats from parallel workers before stopping node".
>
> + * If we are not able to send the tuple, then we assume that
> + * destination has closed and we won't be able to send any
more
> + * tuples so we just end the loop.
>
> Suggest: "If we are not able to send the tuple, we assume the
> destination has closed and no more tuples can be sent. If that's the
> case, end the loop."
>

Changed as per suggestion.

> +static void
> +EstimateParallelSupportInfoSpace(ParallelContext *pcxt, ParamListInfo
params,
> + List *serialized_param_exec_vals,
> + int instOptions, Size *params_size,
> + Size *params_exec_size);
> +static void
> +StoreParallelSupportInfo(ParallelContext *pcxt, ParamListInfo params,
> + List *serialized_param_exec_vals,
> + int instOptions, Size params_size,
> + Size params_exec_size,
> + char **inst_options_space,
> + char **buffer_usage_space);
>
> Whitespace doesn't look like PostgreSQL style. Maybe run pgindent on
> the newly-added files?
>

I have ran pgindent on all new files as well as changes for funnel related
patch (parallel_seqscan_funnel_v18.patch). I have not ran pgindent
for changes in partial seq scan patch, as I feel still there are couple of
things like handling of subplans needs to be done for that patch.

> +/*
> + * This is required for parallel plan execution to fetch the information
> + * from dsm.
> + */
>
> This comment doesn't really say anything. Can we get a better one?
>

Changed.

> + /*
> + * We expect each worker to populate the BufferUsage structure
> + * allocated by master backend and then master backend will aggregate
> + * all the usage along with it's own, so account it for each worker.
> + */
>
> This also needs improvement. Especially because...
>
> + /*
> + * We expect each worker to populate the instrumentation structure
> + * allocated by master backend and then master backend will aggregate
> + * all the information, so account it for each worker.
> + */
>
> ...it's almost identical to this one.
>

Combined both of the comments.

> + * Store bind parameter's list in dynamic shared memory. This is
> + * used for parameters in prepared query.
>
> s/bind parameter's list/bind parameters/. I think you could drop the
> second sentence, too.
>
> + /*
> + * Store PARAM_EXEC parameters list in dynamic shared memory. This
is
> + * used for evaluation plan->initPlan params.
> + */
>
> So is the previous block for PARAM_EXTERN and this is PARAM_EXEC? If
> so, maybe that could be more clearly laid out.
>

Changed as per suggestion.

> +GetParallelSupportInfo(shm_toc *toc, ParamListInfo *params,
>
> Could this be a static function? Will it really be needed outside this
file?
>
> And is there any use case for letting some of the arguments be NULL?
> Seems kind of an awkward API.
>

Nopes, so changed it.

> +bool
> +ExecParallelBufferUsageAccum(Node *node)
> +{
> + if (node == NULL)
> + return false;
> +
> + switch (nodeTag(node))
> + {
> + case T_FunnelState:
> + {
> + FinishParallelSetupAndAccumStats((FunnelState*)node);
> + return true;
> + }
> + break;
> + default:
> + break;
> + }
> +
> + (void) planstate_tree_walker((Node*)((PlanState *)node)->lefttree,
NULL,
> + ExecParallelBufferUsageAccum, 0);
> + (void) planstate_tree_walker((Node*)((PlanState *)node)->righttree,
NULL,
> + ExecParallelBufferUsageAccum, 0);
> + return false;
> +}
>
> This seems wacky. I mean, isn't the point of planstate_tree_walker()
> that the callback itself doesn't have to handle recursion like this?
> And if not, then this wouldn't be adequate anyway, because some
> planstate nodes have children that are not in lefttree or righttree
> (cf. explain.c).
>

After rebasing, this part has changed entirely and I think now it is
as per your suggestions.

> + currentRelation = ExecOpenScanRelation(estate,
> + ((SeqScan *)
> node->ss.ps.plan)->scanrelid,
> + eflags);
>
> I can't see how this can possibly be remotely correct. The funnel
> node shouldn't be limited to scanning a baserel (cf. fdw_scan_tlist).
>

Yes, I have removed the InitFunnel related code as discussed upthread.

> +void ExecAccumulateInstInfo(FunnelState *node)
>
> Another place where pgindent would help. There are a bunch of others
> I noticed too, but I'm just mentioning a few here to make the point.
>
> + buffer_usage_worker = (BufferUsage *)(buffer_usage + (i *
> sizeof(BufferUsage)));
>
> Cast it to a BufferUsage * first. Then you can use &foo[i] to find
> the i'th element.
>

Changed as per suggestion.

> + /*
> + * Re-initialize the parallel context and workers to perform
> + * rescan of relation. We want to gracefully shutdown all the
> + * workers so that they should be able to propagate any error
> + * or other information to master backend before dying.
> + */
> + FinishParallelSetupAndAccumStats(node);
>
> Somehow, this makes me feel like that function is badly named.
>

Okay, changed the function name to start with Destroy*.

> +/*
> + * _readPlanInvalItem
> + */
> +static PlanInvalItem *
> +_readPlanInvalItem(void)
> +{
> + READ_LOCALS(PlanInvalItem);
> +
> + READ_INT_FIELD(cacheId);
> + READ_UINT_FIELD(hashValue);
> +
> + READ_DONE();
> +}
>
> I don't see why we should need to be able to copy PlanInvalItems. In
> fact, it seems like a bad idea.
>

This has been addressed as part of readfuncs related patch.

> +#parallel_setup_cost = 0.0 # same scale as above
> +#define DEFAULT_PARALLEL_SETUP_COST 0.0
>
> This value is probably a bit on the low side.
>

Okay, for now I have changed it 1000, let me know if you think it still
sounds on lower side.

> +int parallel_seqscan_degree = 0;
>
> I think we should have a GUC for the maximum degree of parallelism in
> a query generally, not the maximum degree of parallel sequential scan.
>

How about degree_of_parallelism?

Will change the patch if you are okay with the proposed name.

> + if (parallel_seqscan_degree >= MaxConnections)
> + {
> + write_stderr("%s: parallel_scan_degree must be less than
> max_connections\n", progname);
> + ExitPostmaster(1);
> + }
>
> I think this check is thoroughly unnecessary. It's comparing to the
> wrong thing anyway, because what actually matters is
> max_worker_processes, not max_connections. But in any case there is
> no need for the check. If somebody stupidly tries an unreasonable
> value for the maximum degree of parallelism, they won't get that many
> workers, but nothing will break. It's no worse than setting any other
> query planner costing parameter to an insane value.
>

Removed this check.

> --- a/src/include/access/heapam.h
> +++ b/src/include/access/heapam.h
> @@ -126,6 +126,7 @@ extern void heap_rescan_set_params(HeapScanDesc
> scan, ScanKey key,
> extern void heap_endscan(HeapScanDesc scan);
> extern HeapTuple heap_getnext(HeapScanDesc scan, ScanDirection
direction);
>
> +
> extern bool heap_fetch(Relation relation, Snapshot snapshot,
>
> Stray whitespace change.
>

Removed.

> This looks like another instance of using the walker incorrectly.
> Nodes where you just want to let the walk continue shouldn't need to
> be enumerated; dispatching like this should be the default case.

> + case T_Result:
> + fix_opfuncids((Node*) (((Result
> *)node)->resconstantqual));
> + break;

> Seems similarly wrong.

> I have not changed as this will completely go, once I will rebase it to
> your recent commit-9f1255ac.

> + * cost_patialseqscan

> Typo. The actual function name has the same typo.

Changed as per your suggestion.

> + num_parallel_workers = parallel_seqscan_degree;
> + if (parallel_seqscan_degree <= estimated_parallel_workers)
> + num_parallel_workers = parallel_seqscan_degree;
> + else
> + num_parallel_workers = estimated_parallel_workers;

> Use Min?

Okay, that better and changed it accordingly.

Fixed the below Comments by Hari

> + if (inst_options)
> + {
> + instoptions = shm_toc_lookup(toc, PARALLEL_KEY_INST_OPTIONS);
> + *inst_options = *instoptions;
> + if (inst_options)

> Same pointer variable check, it should be if (*inst_options) as per the
> estimate and store functions.

Fixed.

> + if (funnelstate->ss.ps.ps_ProjInfo)
> + slot = funnelstate->ss.ps.ps_ProjInfo->pi_slot;
> + else
> + slot = funnelstate->ss.ss_ScanTupleSlot;

> Currently, there will not be a projinfo for funnel node. So always it uses
> the scan tuple slot. In case if it is different, we need to add the
ExecProject
> call in ExecFunnel function. Currently it is not present, either we can
document
> it or add the function call.

Added an appropriate comment as discussed upthread.

> + if (parallel_seqscan_degree >= MaxConnections)
> + {
> + write_stderr("%s: parallel_scan_degree must be less than
> max_connections\n", progname);
> + ExitPostmaster(1);
> + }

> The error condition works only during server start. User still can set
> parallel seqscan degree
> more than max connection at super user session level and etc.

removed this check.

> + if (!parallelstmt->inst_options)
> + (*receiver->rDestroy) (receiver);

> Why only when there is no instruementation only, the receiver needs to
> be destroyed?

It should be destroyed unconditionally.

While commiting tqueue related changes (commit- 4a4e6893), you have
changed the signature of below API:
+CreateTupleQueueDestReceiver(shm_mq_handle *handle).

I am not sure how we can pass handle to this API without changing
prototype of other API's or may be use this directly instead of
CreateDestReceiver(). Also at other places we already use Set
API's to set the receiver params, so for now I have defined the
Set API to achieve the same.

So to summarize, the following are the main items which are still left:

1. Rename Funnel to Gather, per discussion.

2. Add an ID to each plan node and use that ID as the DSM key.

I feel this needs some more discussion.

3. Currently subplans are not passed to workers, so any query which
generates subplan that needs to be pushed to worker is going to have
unpredictable behaviour. This needs some more discussion.

4. Below comment by Robert -
I would expect EXPLAIN should show the # of workers planned, and
EXPLAIN ANALYZE should show both the planned and actual values.

5. Serialization of bind parameters might need some work[1]

[1] -
http://www.postgresql.org/message-id/CA+TgmoZevF_DAhqbo4j7VhwmEzSZT3wprZviqW4zvao1qgC_wA@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment	Content-Type	Size
parallel_seqscan_funnel_v18.patch	application/octet-stream	104.4 KB
parallel_seqscan_partialseqscan_v18.patch	application/octet-stream	50.4 KB

In response to

Re: Parallel Seq Scan at 2015-09-16 20:58:33 from Robert Haas

Responses

Re: Parallel Seq Scan at 2015-09-25 03:20:33 from Robert Haas
Re: Parallel Seq Scan at 2015-09-30 01:35:17 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Stephen Frost	2015-09-24 19:19:42	Re: No Issue Tracker - Say it Ain't So!
Previous Message	David Fetter	2015-09-24 18:23:10	Re: No Issue Tracker - Say it Ain't So!