Re: parallelize queries containing initplans

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallelize queries containing initplans
Date: 2017-11-11 12:19:56
Message-ID: CAA4eK1Ljg3MgLETvvC-ZVNXXO+1Bn5y_QOD7ez_G+t5tg1jWDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 11, 2017 at 12:15 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Nov 7, 2017 at 4:45 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> As mentioned, changed the status of the patch in CF app.
>
> I spent some time reviewing this patch today and found myself still
> quite uncomfortable with the fact that it was adding execution-time
> work to track the types of parameters - types that would usually not
> even be used. I found the changes to nodeNestLoop.c to be
> particularly objectionable, because we could end up doing the work
> over and over when it is actually not needed at all, or at most once.
>

That's right, but we are just accessing tuple descriptor to get the
type, there shouldn't be much work involved in that. However, I think
your approach has a merit that we don't need to even do that during
execution time.

> I decided to try instead teaching the planner to keep track of the
> types of PARAM_EXEC parameters as they were created, and that seems to
> work fine. See 0001, attached.
>

This looks good to me.

> 0002, attached, is my worked-over version of the rest of the patch. I
> moved the code that serializes and deserializes PARAM_EXEC from
> nodeSubplan.c -- which seemed like a strange choice - to
> execParallel.c.
>

I have tried to follow the practice we have used for param extern
params (SerializeParamList is in params.c) and most of the handling of
initplans is done in nodeSubplan.c, so I choose to keep the newly
added functions there. However, I think keeping it in execParallel.c
is also okay as we do it for serialize plan.

> I removed the type OID from the serialization format
> because there's no reason to do that any more; the worker already
> knows the types from the plan. I did some renaming of the functions
> involved and some adjustment of the comments to refer to "PARAM_EXEC
> parameters" instead of initPlan parameters, because there's no reason
> that I can see why this can only work for initPlans. A Gather node on
> the inner side of a nested loop doesn't sound like a great idea, but I
> think this infrastructure could handle it (though it would need some
> more planner work).
>

I think it would need some work in execution as well because the size
won't be fixed in that case for varchar type of params. We might end
up with something different as well.

> I broke a lot of long lines in your version of
> the patch into multiple lines; please try to be attentive to this
> issue when writing patches in general, as it is a bit tedious to go
> through and insert line breaks in many places.
>

Okay, but I sometimes rely on pgindent for such things as for few
things it becomes difficult to decide which way it will be better.

> Please let me know your thoughts on the attached patches.
>

Few minor comments:
1.
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -23,6 +23,7 @@

#include "postgres.h"

+#include "executor/execExpr.h"
#include "executor/execParallel.h"
#include "executor/executor.h"
#include "executor/nodeBitmapHeapscan.h"
@@ -31,6 +32,7 @@
#include "executor/nodeIndexscan.h"
#include "executor/nodeIndexonlyscan.h"
#include "executor/nodeSeqscan.h"
+#include "executor/nodeSubplan.h"

This is not required if we move serialize and other functions to execParallel.c

2.
+set_param_references(PlannerInfo *root, Plan *plan)
+{
+ Assert(IsA(plan, Gather) ||IsA(plan, GatherMerge));

I think there should be a space after || operator.

3.
+/*
+ * Serialize ParamExecData params corresponding to initplans.
+ *
..
+/*
+ * Restore ParamExecData params corresponding to initplans.
+ */

Shouldn't we change the reference to initplans here as well?

I have fixed the first two in attached patch and left the last one as
I was not sure what you have in mind

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
0002-pq-pushdown-initplan-rebased-1.patch application/octet-stream 24.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabrízio de Royes Mello 2017-11-11 12:35:35 Re: [PATCH] A hook for session start
Previous Message Robert Haas 2017-11-11 12:19:26 Re: [PROPOSAL] Temporal query processing with range types