Re: [HACKERS] parallelize queries containing initplans

From: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] parallelize queries containing initplans
Date: 2017-11-16 10:23:47
Message-ID: CAGz5QC+Mo0Ky8ycs1jNxAoKi-b36mY2B=-nLN7WCxD0M5KTP6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 16, 2017 at 3:34 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Nov 15, 2017 at 12:25 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Tue, Nov 14, 2017 at 11:00 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Yeah, I'm sure it is. I have a vague recollection that there might be
>>> existing regression test cases exercising such things, though I won't
>>> swear to that. The "orderstest" bit in subselect.sql looks like it
>>> might be meant to do that...
>>
>
> I agree that such cases can cause a problem with fixed memory.
>
I'm able to come up with a test case to produce the problem.

regression[107494]=# select ten,count(*) from tenk1 a where a.ten in (select
b.ten from tenk1 b where (select a.ten from tenk1 c where c.ten =
a.ten limit 1) = b.ten limit 1) group by a.ten;

QUERY PLAN
---------------------------------------------------
HashAggregate
Group Key: a.ten
-> Seq Scan on tenk1 a
Filter: (SubPlan 2)
SubPlan 2
-> Limit
InitPlan 1 (returns $1)
-> Limit
-> Seq Scan on tenk1 c
Filter: (ten = a.ten)
-> Seq Scan on tenk1 b
Filter: ($1 = ten)
In the above case, Subplan 2 returns the same value of a.ten that is
used in initplan as a reference. So, this query is nothing but select
ten,count(*) from tenk1 group by a.ten. The output should be 10 rows
each row having a count=1000.

By enforcing parallelism, I got the following plan:
QUERY PLAN
------------------------------------------------------
HashAggregate
Group Key: a.ten
-> Seq Scan on tenk1 a
Filter: (SubPlan 2)
SubPlan 2
-> Limit
InitPlan 1 (returns $1)
-> Limit
-> Seq Scan on tenk1 c
Filter: (ten = a.ten)
-> Gather
Workers Planned: 2
Params Evaluated: $1
-> Parallel Seq Scan on tenk1 b
Filter: ($1 = ten)

When I set parallel_leader_participation to off, I get the incorrect
result as follows:
ten | count
------------
0 | 1000
(1 row)
In the above case, the initplan gets evaluated only once from
ExecInitParallelPlan path. Hence, we see the incorrect result.

>> Here's an updated patch that attempts to work around this problem using DSA.
>>
>
> There were a couple of problems with your changes:
> 1.
> BufferUsage *buffer_usage; /* points to bufusage area in DSM */
> + dsa_handle param_exec; /* serialized PARAM_EXEC parameters */
> @@ -35,12 +36,13 @@ typedef struct ParallelExecutorInfo
> } ParallelExecutorInfo;
>
> This should be dsa_pointer, otherwise, the value returned by
> SerializeParamExecParams will get truncated.
>
> 2. In ExecParallelReinitialize(), we need to evaluate the params
> before serializing them.
>
> 3. I think we should free the dsa pointer at the end of the gather.
>
> Attached patch fixes the mentioned problems.
>
>> It could use a test case that actually tickles the new logic in
>> ExecParallelReinitialize ... this is mostly just to show the concept.
>>
>
> Thanks, it was quite helpful.
>
I've tested the above-mentioned scenario with this patch and it is
working fine. Also, I've created a text column named 'vartext',
inserted some random length texts(max length 100) and tweaked the
above query as follows:
select ten,count(*) from tenk1 a where a.ten in (select
b.ten from tenk1 b where (select a.vartext from tenk1 c where c.ten =
a.ten limit 1) = b.vartext limit 1) group by a.ten;
This query is equivalent to select ten,count(*) from tenk1 group by
a.ten. It also produced the expected result without throwing any
error.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ideriha, Takeshi 2017-11-16 10:31:54 RE: [HACKERS] [WIP] RE: DECLARE STATEMENT setting up a connection in ECPG
Previous Message Amit Kapila 2017-11-16 10:04:29 Re: [HACKERS] parallelize queries containing initplans