Re: variation of row_number with parallel

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: variation of row_number with parallel
Date: 2020-04-14 04:08:29
Message-ID: CAFj8pRD5g14BZfjEMGG=MmxvzepZSeiW9Ht_qPXv5MJ7LK0n3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

út 14. 4. 2020 v 5:59 odesílatel Rajkumar Raghuwanshi <
rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com> napsal:

> Hi,
>
> I have observed row_number() is giving different results when query
> executed in parallel. is this expected w.r.t parallel execution.
>
> CREATE TABLE tbl1 (c1 INT) partition by list (c1);
> CREATE TABLE tbl1_p1 partition of tbl1 FOR VALUES IN (10);
> CREATE TABLE tbl1_p2 partition of tbl1 FOR VALUES IN (20);
> CREATE TABLE tbl1_p3 partition of tbl1 FOR VALUES IN (30);
>
> CREATE TABLE tbl2 (c1 INT, c2 INT,c3 INT) partition by list (c1);
> CREATE TABLE tbl2_p1 partition of tbl2 FOR VALUES IN (1);
> CREATE TABLE tbl2_p2 partition of tbl2 FOR VALUES IN (2);
> CREATE TABLE tbl2_p3 partition of tbl2 FOR VALUES IN (3);
> CREATE TABLE tbl2_p4 partition of tbl2 FOR VALUES IN (4);
> CREATE TABLE tbl2_p5 partition of tbl2 FOR VALUES IN (5);
>
> INSERT INTO tbl1 VALUES (10),(20),(30);
>
> INSERT INTO tbl2 VALUES
> (1,100,20),(2,200,10),(3,100,20),(4,100,30),(5,100,10);
>
> postgres=# explain select e.c2, row_number() over () from tbl1 d, tbl2 e
> where d.c1=e.c3;
> QUERY PLAN
>
>
> ---------------------------------------------------------------------------------------
> WindowAgg (cost=1520.35..12287.73 rows=390150 width=12)
> -> Merge Join (cost=1520.35..7410.85 rows=390150 width=4)
> Merge Cond: (d.c1 = e.c3)
> -> Sort (cost=638.22..657.35 rows=7650 width=4)
> Sort Key: d.c1
> -> Append (cost=0.00..144.75 rows=7650 width=4)
> -> Seq Scan on tbl1_p1 d_1 (cost=0.00..35.50
> rows=2550 width=4)
> -> Seq Scan on tbl1_p2 d_2 (cost=0.00..35.50
> rows=2550 width=4)
> -> Seq Scan on tbl1_p3 d_3 (cost=0.00..35.50
> rows=2550 width=4)
> -> Sort (cost=882.13..907.63 rows=10200 width=8)
> Sort Key: e.c3
> -> Append (cost=0.00..203.00 rows=10200 width=8)
> -> Seq Scan on tbl2_p1 e_1 (cost=0.00..30.40
> rows=2040 width=8)
> -> Seq Scan on tbl2_p2 e_2 (cost=0.00..30.40
> rows=2040 width=8)
> -> Seq Scan on tbl2_p3 e_3 (cost=0.00..30.40
> rows=2040 width=8)
> -> Seq Scan on tbl2_p4 e_4 (cost=0.00..30.40
> rows=2040 width=8)
> -> Seq Scan on tbl2_p5 e_5 (cost=0.00..30.40
> rows=2040 width=8)
> (17 rows)
>
> postgres=# select e.c2, row_number() over () from tbl1 d, tbl2 e where
> d.c1=e.c3;
> c2 | row_number
> -----+------------
> *200 | 1*
> 100 | 2
> 100 | 3
> 100 | 4
> 100 | 5
> (5 rows)
>
> postgres=#
> postgres=# set parallel_setup_cost = 0;
> SET
> postgres=# set parallel_tuple_cost = 0;
> SET
> postgres=#
> postgres=# explain select e.c2, row_number() over () from tbl1 d, tbl2 e
> where d.c1=e.c3;
> QUERY PLAN
>
>
> ------------------------------------------------------------------------------------------------------
> WindowAgg (cost=130.75..7521.21 rows=390150 width=12)
> -> Gather (cost=130.75..2644.34 rows=390150 width=4)
> Workers Planned: 2
> -> Parallel Hash Join (cost=130.75..2644.34 rows=162562 width=4)
> Hash Cond: (e.c3 = d.c1)
> -> Parallel Append (cost=0.00..131.25 rows=4250 width=8)
> -> Parallel Seq Scan on tbl2_p1 e_1
> (cost=0.00..22.00 rows=1200 width=8)
> -> Parallel Seq Scan on tbl2_p2 e_2
> (cost=0.00..22.00 rows=1200 width=8)
> -> Parallel Seq Scan on tbl2_p3 e_3
> (cost=0.00..22.00 rows=1200 width=8)
> -> Parallel Seq Scan on tbl2_p4 e_4
> (cost=0.00..22.00 rows=1200 width=8)
> -> Parallel Seq Scan on tbl2_p5 e_5
> (cost=0.00..22.00 rows=1200 width=8)
> -> Parallel Hash (cost=90.93..90.93 rows=3186 width=4)
> -> Parallel Append (cost=0.00..90.93 rows=3186
> width=4)
> -> Parallel Seq Scan on tbl1_p1 d_1
> (cost=0.00..25.00 rows=1500 width=4)
> -> Parallel Seq Scan on tbl1_p2 d_2
> (cost=0.00..25.00 rows=1500 width=4)
> -> Parallel Seq Scan on tbl1_p3 d_3
> (cost=0.00..25.00 rows=1500 width=4)
> (16 rows)
>
> postgres=# select e.c2, row_number() over () from tbl1 d, tbl2 e where
> d.c1=e.c3;
> c2 | row_number
> -----+------------
> 100 | 1
> 100 | 2
> 100 | 3
> *200 | 4*
> 100 | 5
> (5 rows)
>

there are not ORDER BY clause, so order is not defined - paralel hash join
surely doesn't ensure a order.

I think so this behave is expected.

Regards

Pavel

> Thanks & Regards,
> Rajkumar Raghuwanshi
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rajkumar Raghuwanshi 2020-04-14 04:18:42 Re: variation of row_number with parallel
Previous Message Masahiko Sawada 2020-04-14 04:06:14 Re: Race condition in SyncRepGetSyncStandbysPriority