Re: [HACKERS] parallelize queries containing initplans

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] parallelize queries containing initplans
Date: 2017-11-14 05:00:07
Message-ID: CAA4eK1KKhMVKjAQrsUx1Yjh9P1cBcYL0EygFb9-bCSpb-HrAqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 14, 2017 at 2:39 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sat, Nov 11, 2017 at 7:19 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> I have tried to follow the practice we have used for param extern
>> params (SerializeParamList is in params.c) and most of the handling of
>> initplans is done in nodeSubplan.c, so I choose to keep the newly
>> added functions there. However, I think keeping it in execParallel.c
>> is also okay as we do it for serialize plan.
>
> To me it feels like there is only a very loose coupling between this
> code and what's currently in nodeSubplan.c, but maybe I'm wrong. I am
> also not sure execParallel.c is the perfect place either; we might
> eventually split execParallel.c into multiple files if it accumulates
> too many things that are too different from each other.
>

Another possibility is subselect.c, because it contains initplan
stuff, however, most of the things are related to planning, so not
sure. I think nodeSubplan.c won't be a bad choice as it is mentioned
in the file header that it is concerned about initplan stuff.

>> I think it would need some work in execution as well because the size
>> won't be fixed in that case for varchar type of params. We might end
>> up with something different as well.
>
> Hmm, true. But don't we also have that problem with the patch as
> written? I mean, didn't we determine at some point that the value of
> an InitPlan can change, if the initplan depends on a parameter that is
> correlated but not directly correlated? The existence of
> ExecReScanSetParamPlan() seems to suggest that this is indeed
> possible, and that means that the parameter could be reevaluated and
> evaluate, on the second time through, to a value that requires more
> storage than before. That would be a problem, because
> ExecParallelReInitializeDSM reuses the old DSM.
>

I think this would have been a matter of concern if we use initplans
below Gather/Gather Merge. The patch uses initplans which are between
current query level and root. So, I think we don't need to reevaluate
such parameters. Let us try to see it via example:

QUERY PLAN
----------------------------------------------------------------------------------------
Aggregate (cost=62.08..62.08 rows=1 width=8)
InitPlan 1 (returns $1)
-> Finalize Aggregate (cost=20.64..20.65 rows=1 width=8)
-> Gather (cost=20.63..20.64 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=20.63..20.64 rows=1 width=8)
-> Parallel Seq Scan on t3 (cost=0.00..18.50
rows=850 width=4)
-> Hash Join (cost=20.75..41.42 rows=1 width=4)
Hash Cond: (t1.j = t2.j)
-> Gather (cost=0.00..20.63 rows=10 width=12)
Workers Planned: 2
Params Evaluated: $1
-> Parallel Seq Scan on t1 (cost=0.00..20.63 rows=4 width=12)
Filter: (k = $1)
-> Hash (cost=20.63..20.63 rows=10 width=8)
-> Gather (cost=0.00..20.63 rows=10 width=8)
Workers Planned: 2
Params Evaluated: $1
-> Parallel Seq Scan on t2 (cost=0.00..20.63
rows=4 width=8)
Filter: (k = $1)
(20 rows)

Now, here even if initplan would have been an undirect correlated
plan, it wouldn't have been a problem, because we don't need to
reevaluate it for Gather node below Hash.

Am I missing something? Do you have some test or shape of the plan in
mind which can cause a problem?

>>> I broke a lot of long lines in your version of
>>> the patch into multiple lines; please try to be attentive to this
>>> issue when writing patches in general, as it is a bit tedious to go
>>> through and insert line breaks in many places.
>>
>> Okay, but I sometimes rely on pgindent for such things as for few
>> things it becomes difficult to decide which way it will be better.
>
> pgindent doesn't in general break long lines. There are a few cases
> where it does, like reflowing comments. But mostly you have to do
> that manually. For instance, if you write a =
> verylongfunctionname(longargument(thing), stuff(thing), foobar(thing,
> thing, thing)) it's going to keep all that on one line. If you insert
> line breaks between the arguments it will keep them, though. I think
> it's worth doing something like:
>
> git diff | awk 'length($0)>81'
>
> on your patches before you submit them. If you see things in there
> that can be reformatted to make the long lines shorter, do it.
>

Okay, point noted.

>> I have fixed the first two in attached patch and left the last one as
>> I was not sure what you have in mind
>
> Oh, yeah, that should be changed too, something like "Serialize
> PARAM_EXEC parameters." Sorry, I thought I caught all of the
> references to initplans specifically, but I guess I missed a few.
>

No issues, I can go through it once more to change the comments
wherever initplans is used after we settle on the above discussion.
If we decide that initplan and other PARAM_EXEC params need different
treatment, then we might want to retain initplans in the comments.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrea Adami 2017-11-14 05:09:08 Re: [HACKERS] Row Level Security Bug ?
Previous Message Peter Geoghegan 2017-11-14 04:08:42 Re: [HACKERS] ginInsertCleanup called from vacuum could still miss tuples to be deleted