From: | Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] parallelize queries containing initplans |
Date: | 2017-11-16 10:23:47 |
Message-ID: | CAGz5QC+Mo0Ky8ycs1jNxAoKi-b36mY2B=-nLN7WCxD0M5KTP6g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Nov 16, 2017 at 3:34 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Nov 15, 2017 at 12:25 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Tue, Nov 14, 2017 at 11:00 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Yeah, I'm sure it is. I have a vague recollection that there might be
>>> existing regression test cases exercising such things, though I won't
>>> swear to that. The "orderstest" bit in subselect.sql looks like it
>>> might be meant to do that...
>>
>
> I agree that such cases can cause a problem with fixed memory.
>
I'm able to come up with a test case to produce the problem.
regression[107494]=# select ten,count(*) from tenk1 a where a.ten in (select
b.ten from tenk1 b where (select a.ten from tenk1 c where c.ten =
a.ten limit 1) = b.ten limit 1) group by a.ten;
QUERY PLAN
---------------------------------------------------
HashAggregate
Group Key: a.ten
-> Seq Scan on tenk1 a
Filter: (SubPlan 2)
SubPlan 2
-> Limit
InitPlan 1 (returns $1)
-> Limit
-> Seq Scan on tenk1 c
Filter: (ten = a.ten)
-> Seq Scan on tenk1 b
Filter: ($1 = ten)
In the above case, Subplan 2 returns the same value of a.ten that is
used in initplan as a reference. So, this query is nothing but select
ten,count(*) from tenk1 group by a.ten. The output should be 10 rows
each row having a count=1000.
By enforcing parallelism, I got the following plan:
QUERY PLAN
------------------------------------------------------
HashAggregate
Group Key: a.ten
-> Seq Scan on tenk1 a
Filter: (SubPlan 2)
SubPlan 2
-> Limit
InitPlan 1 (returns $1)
-> Limit
-> Seq Scan on tenk1 c
Filter: (ten = a.ten)
-> Gather
Workers Planned: 2
Params Evaluated: $1
-> Parallel Seq Scan on tenk1 b
Filter: ($1 = ten)
When I set parallel_leader_participation to off, I get the incorrect
result as follows:
ten | count
------------
0 | 1000
(1 row)
In the above case, the initplan gets evaluated only once from
ExecInitParallelPlan path. Hence, we see the incorrect result.
>> Here's an updated patch that attempts to work around this problem using DSA.
>>
>
> There were a couple of problems with your changes:
> 1.
> BufferUsage *buffer_usage; /* points to bufusage area in DSM */
> + dsa_handle param_exec; /* serialized PARAM_EXEC parameters */
> @@ -35,12 +36,13 @@ typedef struct ParallelExecutorInfo
> } ParallelExecutorInfo;
>
> This should be dsa_pointer, otherwise, the value returned by
> SerializeParamExecParams will get truncated.
>
> 2. In ExecParallelReinitialize(), we need to evaluate the params
> before serializing them.
>
> 3. I think we should free the dsa pointer at the end of the gather.
>
> Attached patch fixes the mentioned problems.
>
>> It could use a test case that actually tickles the new logic in
>> ExecParallelReinitialize ... this is mostly just to show the concept.
>>
>
> Thanks, it was quite helpful.
>
I've tested the above-mentioned scenario with this patch and it is
working fine. Also, I've created a text column named 'vartext',
inserted some random length texts(max length 100) and tweaked the
above query as follows:
select ten,count(*) from tenk1 a where a.ten in (select
b.ten from tenk1 b where (select a.vartext from tenk1 c where c.ten =
a.ten limit 1) = b.vartext limit 1) group by a.ten;
This query is equivalent to select ten,count(*) from tenk1 group by
a.ten. It also produced the expected result without throwing any
error.
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Ideriha, Takeshi | 2017-11-16 10:31:54 | RE: [HACKERS] [WIP] RE: DECLARE STATEMENT setting up a connection in ECPG |
Previous Message | Amit Kapila | 2017-11-16 10:04:29 | Re: [HACKERS] parallelize queries containing initplans |