Re: parallelize queries containing initplans

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallelize queries containing initplans
Date: 2017-10-05 09:52:54
Message-ID: CAA4eK1L8PSgvHcBMfgAHzoeBKPcknQPDg_Bdii384efPdhrCeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 4, 2017 at 12:55 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Oct 4, 2017 at 3:40 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Tue, Oct 3, 2017 at 7:33 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>> Having said all that, I think that this patch only wants to handle the
>> subset of cases (2) and (4) where the relevant InitPlan is attached
>> ABOVE the Gather node -- which seems very reasonable, because
>> evaluating an InitPlan at a level of the plan tree above the level at
>> which it is defined sounds like it might be complex. But I still
>> don't quite see why we need these tests. I mean, if we only allow
>> Param references from a set of safe parameter IDs, and the safe
>> parameter IDs include only those IDs that can be generated in a
>> worker, then won't other InitPlans in the workers anyway be ruled out?
>
> It doesn't happen always. There are cases when the part of required
> conditions are pushed one query level below, so when we check during
> max_parallel_hazard_walker, they look safe, but actually, they are
> not. I will try to explain by example:
>
> postgres=# explain (costs off, verbose) select * from t1 where t1.i in
> ( select 1 + (select max(j) from t3));
> QUERY PLAN
> ----------------------------------------------------------------------
> Hash Semi Join
> Output: t1.i, t1.j, t1.k
> Hash Cond: (t1.i = ((1 + $1)))
> -> Seq Scan on public.t1
> Output: t1.i, t1.j, t1.k
> -> Hash
> Output: ((1 + $1))
> -> Result
> Output: (1 + $1)
> InitPlan 1 (returns $1)
> -> Finalize Aggregate
> Output: max(t3.j)
> -> Gather
> Output: (PARTIAL max(t3.j))
> Workers Planned: 2
> -> Partial Aggregate
> Output: PARTIAL max(t3.j)
> -> Parallel Seq Scan on public.t3
> Output: t3.j
> (19 rows)
>
> In the above example, you can see that the condition referring to
> initplan (1 + $1) is pushed one level below. So when it tries to
> check parallel safety for the join condition, it won't see Param node.
>

To add more detail, the param node in join qual is replaced with Var
during pullup of sublink. Basically, it constructs vars from target
list of subquery and then replaces it in join quals. Refer code:
convert_ANY_sublink_to_join()
{
..
/*
* Build a list of Vars representing the subselect outputs.
*/
subquery_vars = generate_subquery_vars(root,
subselect->targetList,
rtindex);

/*
* Build the new join's qual expression, replacing Params with these Vars.
*/
quals = convert_testexpr(root, sublink->testexpr, subquery_vars);
..
}

Now, unless, I am missing something here, it won't be possible to
detect params in such cases during forming of join rels and hence we
need the tests in generate_gather_paths. Let me know if I am missing
something in this context or if you have any better ideas to make it
work?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-10-05 09:54:54 Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Previous Message Ashutosh Bapat 2017-10-05 09:38:08 Re: Partition-wise join for join between (declaratively) partitioned tables