Re: parallelize queries containing initplans

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallelize queries containing initplans
Date: 2017-10-04 07:25:30
Message-ID: CAA4eK1KThoT0choY6QYaNyP+oCRS27MRA+0r8rQt69Oauz_OtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 4, 2017 at 3:40 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Oct 3, 2017 at 7:33 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> Having said all that, I think that this patch only wants to handle the
> subset of cases (2) and (4) where the relevant InitPlan is attached
> ABOVE the Gather node -- which seems very reasonable, because
> evaluating an InitPlan at a level of the plan tree above the level at
> which it is defined sounds like it might be complex. But I still
> don't quite see why we need these tests. I mean, if we only allow
> Param references from a set of safe parameter IDs, and the safe
> parameter IDs include only those IDs that can be generated in a
> worker, then won't other InitPlans in the workers anyway be ruled out?

It doesn't happen always. There are cases when the part of required
conditions are pushed one query level below, so when we check during
max_parallel_hazard_walker, they look safe, but actually, they are
not. I will try to explain by example:

postgres=# explain (costs off, verbose) select * from t1 where t1.i in
( select 1 + (select max(j) from t3));
QUERY PLAN
----------------------------------------------------------------------
Hash Semi Join
Output: t1.i, t1.j, t1.k
Hash Cond: (t1.i = ((1 + $1)))
-> Seq Scan on public.t1
Output: t1.i, t1.j, t1.k
-> Hash
Output: ((1 + $1))
-> Result
Output: (1 + $1)
InitPlan 1 (returns $1)
-> Finalize Aggregate
Output: max(t3.j)
-> Gather
Output: (PARTIAL max(t3.j))
Workers Planned: 2
-> Partial Aggregate
Output: PARTIAL max(t3.j)
-> Parallel Seq Scan on public.t3
Output: t3.j
(19 rows)

In the above example, you can see that the condition referring to
initplan (1 + $1) is pushed one level below. So when it tries to
check parallel safety for the join condition, it won't see Param node.
Now, consider if we don't check contains_parallel_unsafe_param during
generate_gather_paths, then it will lead to below plan.

postgres=# explain (costs off, verbose) select * from t1 where t1.i in
( select 1 + (select max(j) from t3));
QUERY PLAN
----------------------------------------------------------------------------
Gather
Output: t1.i, t1.j, t1.k
Workers Planned: 2
-> Hash Semi Join
Output: t1.i, t1.j, t1.k
Hash Cond: (t1.i = ((1 + $1)))
-> Parallel Seq Scan on public.t1
Output: t1.i, t1.j, t1.k
-> Hash
Output: ((1 + $1))
-> Result
Output: (1 + $1)
InitPlan 1 (returns $1)
-> Finalize Aggregate
Output: max(t3.j)
-> Gather
Output: (PARTIAL max(t3.j))
Workers Planned: 2
-> Partial Aggregate
Output: PARTIAL max(t3.j)
-> Parallel Seq Scan on public.t3
Output: t3.j
(22 rows)

This is wrong because when we will try to evaluate params that are
required at gather node, we won't get the required param as there is
no initplan at that level.

>
> If I am all mixed up, please help straighten me out.
>

I think whatever you said is right and very clear.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-10-04 07:27:48 Re: parallelize queries containing initplans
Previous Message Laurenz Albe 2017-10-04 06:51:21 Re: list of credits for release notes