Re: Possible regression with gather merge.

From: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
To: Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robert(dot)haas(at)enterprisedb(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)enterprisedb(dot)com>
Subject: Re: Possible regression with gather merge.
Date: 2017-03-22 06:35:10
Message-ID: CAGPqQf3GafRitDZC=BrTG7t8GrM8OE_ZfddzxtWOqdy_ZT7GCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for reporting, I am looking into this.

On Wed, Mar 22, 2017 at 11:51 AM, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>
wrote:

> Adding more rows to table make gather merge execution time very slow
> when compared to non-parallel plan we get after disabling gather
> merge.
>
> create table test as (select id, (random()*10000)::int as v1, random() as
> v2 from generate_series(1,100000000) id);
>
> postgres=# set max_parallel_workers_per_gather = default;
> SET
> postgres=# explain analyze select * from test order by v1, v2 limit 10;
>
> QUERY PLAN
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------
> Limit (cost=1858610.53..1858611.70 rows=10 width=16) (actual
> time=31103.880..31103.885 rows=10 loops=1)
> -> Gather Merge (cost=1858610.53..11581520.05 rows=83333406
> width=16) (actual time=31103.878..31103.882 rows=10 loops=1)
> Workers Planned: 2
> Workers Launched: 2
> -> Sort (cost=1857610.50..1961777.26 rows=41666703
> width=16) (actual time=30560.865..30561.046 rows=911 loops=3)
> Sort Key: v1, v2
> Sort Method: external merge Disk: 841584kB
> -> Parallel Seq Scan on test (cost=0.00..957208.03
> rows=41666703 width=16) (actual time=0.050..2330.275 rows=33333333
> loops=3)
> Planning time: 0.292 ms
> Execution time: 31502.896 ms
> (10 rows)
>
> postgres=# set max_parallel_workers_per_gather = 0;
> SET
> postgres=# explain analyze select * from test order by v1, v2 limit 10;
> QUERY
> PLAN
> ------------------------------------------------------------
> ------------------------------------------------------------------------
> Limit (cost=3701507.83..3701507.85 rows=10 width=16) (actual
> time=13231.264..13231.266 rows=10 loops=1)
> -> Sort (cost=3701507.83..3951508.05 rows=100000088 width=16)
> (actual time=13231.261..13231.262 rows=10 loops=1)
> Sort Key: v1, v2
> Sort Method: top-N heapsort Memory: 25kB
> -> Seq Scan on test (cost=0.00..1540541.88 rows=100000088
> width=16) (actual time=0.045..6759.363 rows=100000000 loops=1)
> Planning time: 0.131 ms
> Execution time: 13231.299 ms
> (7 rows)
>
> On Wed, Mar 22, 2017 at 11:07 AM, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>
> wrote:
> > I accidently encountered a case where gather merge was picked as
> > default but disabling same by setting max_parallel_workers_per_gather
> > = 0; produced a non-parallel plan which was faster than gather merge,
> > but its cost is marked too high when compared to gather merge.
> >
> > I guess we need some cost adjustment is planner code.
> >
> > Test setting
> > =========
> > create table test as (select id, (random()*10000)::int as v1, random() as
> > v2 from generate_series(1,1000000) id);
> > create index test_v1_idx on test (v1);
> >
> >
> > Server setting is default.
> >
> >
> > postgres=# explain analyze select * from test order by v1, v2 limit 10;
> > QUERY
> > PLAN
> > ------------------------------------------------------------
> ------------------------------------------------------------
> ----------------
> > Limit (cost=19576.71..19577.88 rows=10 width=16) (actual
> > time=265.989..265.995 rows=10 loops=1)
> > -> Gather Merge (cost=19576.71..116805.80 rows=833334 width=16)
> > (actual time=265.987..265.992 rows=10 loops=1)
> > Workers Planned: 2
> > Workers Launched: 2
> > -> Sort (cost=18576.69..19618.36 rows=416667 width=16)
> > (actual time=250.202..250.424 rows=911 loops=3)
> > Sort Key: v1, v2
> > Sort Method: external merge Disk: 9272kB
> > -> Parallel Seq Scan on test (cost=0.00..9572.67
> > rows=416667 width=16) (actual time=0.053..41.397 rows=333333 loops=3)
> > Planning time: 0.193 ms
> > Execution time: 271.222 ms
> >
> > postgres=# set max_parallel_workers_per_gather = 0;
> > SET
> > postgres=# explain analyze select * from test order by v1, v2 limit 10;
> > QUERY PLAN
> > ------------------------------------------------------------
> -----------------------------------------------------------------
> > Limit (cost=37015.64..37015.67 rows=10 width=16) (actual
> > time=211.582..211.584 rows=10 loops=1)
> > -> Sort (cost=37015.64..39515.64 rows=1000000 width=16) (actual
> > time=211.581..211.582 rows=10 loops=1)
> > Sort Key: v1, v2
> > Sort Method: top-N heapsort Memory: 25kB
> > -> Seq Scan on test (cost=0.00..15406.00 rows=1000000
> > width=16) (actual time=0.085..107.522 rows=1000000 loops=1)
> > Planning time: 0.093 ms
> > Execution time: 211.608 ms
> > (7 rows)
> >
> >
> >
> > --
> > Thanks and Regards
> > Mithun C Y
> > EnterpriseDB: http://www.enterprisedb.com
>
>
>
> --
> Thanks and Regards
> Mithun C Y
> EnterpriseDB: http://www.enterprisedb.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Rushabh Lathia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rafia Sabih 2017-03-22 07:00:22 Re: Enabling parallelism for queries coming from SQL or other PL functions
Previous Message Mithun Cy 2017-03-22 06:21:28 Re: Possible regression with gather merge.