Re: variation of row_number with parallel

From: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: variation of row_number with parallel
Date: 2020-04-14 04:18:42
Message-ID: CAKcux6nYOpZLtx8BpZPAX0iet2Vid5MdBi1TbD5rYP+f9mamTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 14, 2020 at 9:39 AM Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
wrote:

>
>
> út 14. 4. 2020 v 5:59 odesílatel Rajkumar Raghuwanshi <
> rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com> napsal:
>
>> Hi,
>>
>> I have observed row_number() is giving different results when query
>> executed in parallel. is this expected w.r.t parallel execution.
>>
>> CREATE TABLE tbl1 (c1 INT) partition by list (c1);
>> CREATE TABLE tbl1_p1 partition of tbl1 FOR VALUES IN (10);
>> CREATE TABLE tbl1_p2 partition of tbl1 FOR VALUES IN (20);
>> CREATE TABLE tbl1_p3 partition of tbl1 FOR VALUES IN (30);
>>
>> CREATE TABLE tbl2 (c1 INT, c2 INT,c3 INT) partition by list (c1);
>> CREATE TABLE tbl2_p1 partition of tbl2 FOR VALUES IN (1);
>> CREATE TABLE tbl2_p2 partition of tbl2 FOR VALUES IN (2);
>> CREATE TABLE tbl2_p3 partition of tbl2 FOR VALUES IN (3);
>> CREATE TABLE tbl2_p4 partition of tbl2 FOR VALUES IN (4);
>> CREATE TABLE tbl2_p5 partition of tbl2 FOR VALUES IN (5);
>>
>> INSERT INTO tbl1 VALUES (10),(20),(30);
>>
>> INSERT INTO tbl2 VALUES
>> (1,100,20),(2,200,10),(3,100,20),(4,100,30),(5,100,10);
>>
>> postgres=# explain select e.c2, row_number() over () from tbl1 d, tbl2 e
>> where d.c1=e.c3;
>> QUERY PLAN
>>
>>
>> ---------------------------------------------------------------------------------------
>> WindowAgg (cost=1520.35..12287.73 rows=390150 width=12)
>> -> Merge Join (cost=1520.35..7410.85 rows=390150 width=4)
>> Merge Cond: (d.c1 = e.c3)
>> -> Sort (cost=638.22..657.35 rows=7650 width=4)
>> Sort Key: d.c1
>> -> Append (cost=0.00..144.75 rows=7650 width=4)
>> -> Seq Scan on tbl1_p1 d_1 (cost=0.00..35.50
>> rows=2550 width=4)
>> -> Seq Scan on tbl1_p2 d_2 (cost=0.00..35.50
>> rows=2550 width=4)
>> -> Seq Scan on tbl1_p3 d_3 (cost=0.00..35.50
>> rows=2550 width=4)
>> -> Sort (cost=882.13..907.63 rows=10200 width=8)
>> Sort Key: e.c3
>> -> Append (cost=0.00..203.00 rows=10200 width=8)
>> -> Seq Scan on tbl2_p1 e_1 (cost=0.00..30.40
>> rows=2040 width=8)
>> -> Seq Scan on tbl2_p2 e_2 (cost=0.00..30.40
>> rows=2040 width=8)
>> -> Seq Scan on tbl2_p3 e_3 (cost=0.00..30.40
>> rows=2040 width=8)
>> -> Seq Scan on tbl2_p4 e_4 (cost=0.00..30.40
>> rows=2040 width=8)
>> -> Seq Scan on tbl2_p5 e_5 (cost=0.00..30.40
>> rows=2040 width=8)
>> (17 rows)
>>
>> postgres=# select e.c2, row_number() over () from tbl1 d, tbl2 e where
>> d.c1=e.c3;
>> c2 | row_number
>> -----+------------
>> *200 | 1*
>> 100 | 2
>> 100 | 3
>> 100 | 4
>> 100 | 5
>> (5 rows)
>>
>> postgres=#
>> postgres=# set parallel_setup_cost = 0;
>> SET
>> postgres=# set parallel_tuple_cost = 0;
>> SET
>> postgres=#
>> postgres=# explain select e.c2, row_number() over () from tbl1 d, tbl2 e
>> where d.c1=e.c3;
>> QUERY PLAN
>>
>>
>> ------------------------------------------------------------------------------------------------------
>> WindowAgg (cost=130.75..7521.21 rows=390150 width=12)
>> -> Gather (cost=130.75..2644.34 rows=390150 width=4)
>> Workers Planned: 2
>> -> Parallel Hash Join (cost=130.75..2644.34 rows=162562
>> width=4)
>> Hash Cond: (e.c3 = d.c1)
>> -> Parallel Append (cost=0.00..131.25 rows=4250 width=8)
>> -> Parallel Seq Scan on tbl2_p1 e_1
>> (cost=0.00..22.00 rows=1200 width=8)
>> -> Parallel Seq Scan on tbl2_p2 e_2
>> (cost=0.00..22.00 rows=1200 width=8)
>> -> Parallel Seq Scan on tbl2_p3 e_3
>> (cost=0.00..22.00 rows=1200 width=8)
>> -> Parallel Seq Scan on tbl2_p4 e_4
>> (cost=0.00..22.00 rows=1200 width=8)
>> -> Parallel Seq Scan on tbl2_p5 e_5
>> (cost=0.00..22.00 rows=1200 width=8)
>> -> Parallel Hash (cost=90.93..90.93 rows=3186 width=4)
>> -> Parallel Append (cost=0.00..90.93 rows=3186
>> width=4)
>> -> Parallel Seq Scan on tbl1_p1 d_1
>> (cost=0.00..25.00 rows=1500 width=4)
>> -> Parallel Seq Scan on tbl1_p2 d_2
>> (cost=0.00..25.00 rows=1500 width=4)
>> -> Parallel Seq Scan on tbl1_p3 d_3
>> (cost=0.00..25.00 rows=1500 width=4)
>> (16 rows)
>>
>> postgres=# select e.c2, row_number() over () from tbl1 d, tbl2 e where
>> d.c1=e.c3;
>> c2 | row_number
>> -----+------------
>> 100 | 1
>> 100 | 2
>> 100 | 3
>> *200 | 4*
>> 100 | 5
>> (5 rows)
>>
>
> there are not ORDER BY clause, so order is not defined - paralel hash join
> surely doesn't ensure a order.
> I think so this behave is expected.
>
thanks.

>
> Regards
>
> Pavel
>
>
>> Thanks & Regards,
>> Rajkumar Raghuwanshi
>>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2020-04-14 05:23:37 Re: Poll: are people okay with function/operator table redesign?
Previous Message Pavel Stehule 2020-04-14 04:08:29 Re: variation of row_number with parallel