Re: Wrong plan with extra parallel workers

From: Guilherme Pereira <guilherme(at)guilherme-pereira(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Wrong plan with extra parallel workers
Date: 2018-04-27 09:47:23
Message-ID: CAC4tJuiWMEyo1CCOpf6GLcxHqANVZOthEBU6gFs4pNh6BOi_JQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Forgot to mention that I'm using the development branch of Postgres 11.
Also as requested, sending the plans via the https://explain.depesz.com
app.

set max_parallel_workers_per_gather = 2;

EXPLAIN ANALYZE
SELECT
"f_zendesktickets_aaeljtllr5at3el"."via_id" AS "a_1303",
COUNT( DISTINCT "f_zendesktags_aakrjpgq72ad93i"."ticket_id_id" ) AS
"c_fb839a9bd6f2015f"
FROM
"f_zendesktags_aakrjpgq72ad93i"
INNER JOIN
"f_zendesktickets_aaeljtllr5at3el"
ON "f_zendesktags_aakrjpgq72ad93i"."ticket_id_id" =
"f_zendesktickets_aaeljtllr5at3el"."id"
GROUP BY
1 ;

Plan here :
https://explain.depesz.com/s/Vyb5

---------------------------------

set max_parallel_workers_per_gather = 6;

Same SQL as above

https://explain.depesz.com/s/9tkK

--------------------------

Strangely with the count_distinct implementation from Tomas Vondra from
https://github.com/tvondra/count_distinct, it doesn' happen, and it scales
nicely with the extra workers.
set max_parallel_workers_per_gather = 6;

EXPLAIN ANALYZE
SELECT
"f_zendesktickets_aaeljtllr5at3el"."via_id" AS "a_1303",
COUNT_DISTINCT ( "f_zendesktags_aakrjpgq72ad93i"."ticket_id_id" ) AS
"c_fb839a9bd6f2015f"
FROM
"f_zendesktags_aakrjpgq72ad93i"
INNER JOIN
"f_zendesktickets_aaeljtllr5at3el"
ON "f_zendesktags_aakrjpgq72ad93i"."ticket_id_id" =
"f_zendesktickets_aaeljtllr5at3el"."id"
GROUP BY
1 ;

https://explain.depesz.com/s/syKw

On Fri, 27 Apr 2018 at 11:18, Guilherme Pereira <guiperpt(at)gmail(dot)com> wrote:

> Hi,
>
> Having a strange situation, where adding extra parallel workers
> (max_parallel_workers_per_gather), the planner chooses a different plan,
> with nested loops, which makes the query twice as slow. Strangely with the
> COUNT_DISTINCT implementation from Tomas Vondra (
> https://github.com/tvondra/count_distinct) it scales nicely (almost
> linearly) with the workers.
>
> Can provide the DDL's or more info if needed. Any ideas why this happens?
>
> set max_parallel_workers_per_gather = 2;
>
> EXPLAIN ANALYZE
> SELECT
> "f_zendesktickets_aaeljtllr5at3el"."via_id" AS "a_1303",
> COUNT( DISTINCT "f_zendesktags_aakrjpgq72ad93i"."ticket_id_id" ) AS
> "c_fb839a9bd6f2015f"
> FROM
> "f_zendesktags_aakrjpgq72ad93i"
> INNER JOIN
> "f_zendesktickets_aaeljtllr5at3el"
> ON "f_zendesktags_aakrjpgq72ad93i"."ticket_id_id" =
> "f_zendesktickets_aaeljtllr5at3el"."id"
> GROUP BY
> 1 ;
>
> GroupAggregate (cost=31676816.72..32928717.16 rows=8 width=12) (actual
> time=124072.467..210958.329 rows=9 loops=1)
> Group Key: f_zendesktickets_aaeljtllr5at3el.via_id
> -> Sort (cost=31676816.72..32094116.84 rows=166920048 width=8)
> (actual time=124072.431..148808.161 rows=166920891 loops=1)
> Sort Key: f_zendesktickets_aaeljtllr5at3el.via_id
> Sort Method: external merge Disk: 2939944kB
> -> Hash Join (cost=1919106.06..6597948.02 rows=166920048
> width=8) (actual time=13063.794..85782.027 rows=166920891 loops=1)
> Hash Cond: (f_zendesktags_aakrjpgq72ad93i.ticket_id_id =
> f_zendesktickets_aaeljtllr5at3el.id)
> -> Seq Scan on f_zendesktags_aakrjpgq72ad93i
> (cost=0.00..2571476.48 rows=166920048 width=4) (actual
> time=0.016..20886.829 rows=166920891 loo
> ps=1)
> -> Hash (cost=1774033.25..1774033.25 rows=11605825
> width=8) (actual time=12981.920..12981.920 rows=11605822 loops=1)
> Buckets: 16777216 Batches: 1 Memory Usage: 584425kB
> -> Seq Scan on f_zendesktickets_aaeljtllr5at3el
> (cost=0.00..1774033.25 rows=11605825 width=8) (actual time=0.045..9262.223
> rows=116058
> 22 loops=1)
> Planning Time: 1.426 ms
> Execution Time: 211441.893 ms
> (13 rows)
>
> ---------------------------------
>
> set max_parallel_workers_per_gather = 6;
>
> EXPLAIN ANALYZE
> SELECT
> "f_zendesktickets_aaeljtllr5at3el"."via_id" AS "a_1303",
> COUNT( DISTINCT "f_zendesktags_aakrjpgq72ad93i"."ticket_id_id" ) AS
> "c_fb839a9bd6f2015f"
> FROM
> "f_zendesktags_aakrjpgq72ad93i"
> INNER JOIN
> "f_zendesktickets_aaeljtllr5at3el"
> ON "f_zendesktags_aakrjpgq72ad93i"."ticket_id_id" =
> "f_zendesktickets_aaeljtllr5at3el"."id"
> GROUP BY
> 1 ;
>
> GroupAggregate (cost=1001.10..31949141.56 rows=8 width=12) (actual
> time=9.125..399880.451 rows=9 loops=1)
> Group Key: f_zendesktickets_aaeljtllr5at3el.via_id
> -> Gather Merge (cost=1001.10..31114541.24 rows=166920048 width=8)
> (actual time=9.037..322148.967 rows=166920891 loops=1)
> Workers Planned: 6
> Workers Launched: 6
> -> Nested Loop (cost=1.01..10826616.89 rows=27820008 width=8)
> (actual time=0.150..30867.494 rows=23845842 loops=7)
> -> Parallel Index Scan using
> f_zendesktickets_aaeljtllr5at3el_via_id_idx on
> f_zendesktickets_aaeljtllr5at3el (cost=0.43..5838596.19 rows=193
> 4304 width=8) (actual time=0.066..3750.816 rows=1657975 loops=7)
> -> Index Only Scan using
> f_zendesktags_aakrjpgq72ad93i_ticket_id_id_idx on
> f_zendesktags_aakrjpgq72ad93i (cost=0.57..2.02 rows=56 width=4) (
> actual time=0.005..0.012 rows=14 loops=11605822)
> Index Cond: (ticket_id_id =
> f_zendesktickets_aaeljtllr5at3el.id)
> Heap Fetches: 166920891
> Planning Time: 1.395 ms
> Execution Time: 400283.994 ms
>
> --------------------------
>
> Strangely with the count_distinct implementation from Tomas Vondra from
> https://github.com/tvondra/count_distinct, it doesn' happen, and it
> scales nicely with the extra workers.
> set max_parallel_workers_per_gather = 6;
>
> EXPLAIN ANALYZE
> SELECT
> "f_zendesktickets_aaeljtllr5at3el"."via_id" AS "a_1303",
> COUNT_DISTINCT ( "f_zendesktags_aakrjpgq72ad93i"."ticket_id_id" ) AS
> "c_fb839a9bd6f2015f"
> FROM
> "f_zendesktags_aakrjpgq72ad93i"
> INNER JOIN
> "f_zendesktickets_aaeljtllr5at3el"
> ON "f_zendesktags_aakrjpgq72ad93i"."ticket_id_id" =
> "f_zendesktickets_aaeljtllr5at3el"."id"
> GROUP BY
> 1 ;
>
> Finalize GroupAggregate (cost=6674091.66..6882748.12 rows=8 width=12)
> (actual time=24724.265..44768.274 rows=9 loops=1)
> Group Key: f_zendesktickets_aaeljtllr5at3el.via_id
> -> Gather Merge (cost=6674091.66..6882747.66 rows=48 width=36)
> (actual time=22655.677..42954.758 rows=60 loops=1)
> Workers Planned: 6
> Workers Launched: 6
> -> Partial GroupAggregate (cost=6673091.57..6881741.73 rows=8
> width=36) (actual time=21427.218..32385.322 rows=9 loops=7)
> Group Key: f_zendesktickets_aaeljtllr5at3el.via_id
> -> Sort (cost=6673091.57..6742641.59 rows=27820008
> width=8) (actual time=20546.722..22817.024 rows=23845842 loops=7)
> Sort Key: f_zendesktickets_aaeljtllr5at3el.via_id
> Sort Method: quicksort Memory: 1851040kB
> Worker 0: Sort Method: quicksort Memory: 1814142kB
> Worker 1: Sort Method: quicksort Memory: 1806328kB
> Worker 2: Sort Method: quicksort Memory: 1814436kB
> Worker 3: Sort Method: quicksort Memory: 1799937kB
> Worker 4: Sort Method: quicksort Memory: 1816058kB
> Worker 5: Sort Method: quicksort Memory: 1815833kB
> -> Parallel Hash Join (cost=1701496.84..3233200.50
> rows=27820008 width=8) (actual time=3094.046..15445.013 rows=23845842
> loops=7)
> Hash Cond:
> (f_zendesktags_aakrjpgq72ad93i.ticket_id_id =
> f_zendesktickets_aaeljtllr5at3el.id)
> -> Parallel Seq Scan on
> f_zendesktags_aakrjpgq72ad93i (cost=0.00..1180476.08 rows=27820008
> width=4) (actual time=0.014..3673.446
> rows=23845842 loops=7)
> -> Parallel Hash (cost=1677318.04..1677318.04
> rows=1934304 width=8) (actual time=3078.160..3078.160 rows=1657975 loops=7)
> Buckets: 16777216 Batches: 1 Memory
> Usage: 585248kB
> -> Parallel Seq Scan on
> f_zendesktickets_aaeljtllr5at3el (cost=0.00..1677318.04 rows=1934304
> width=8) (actual time=0.067..
> 2171.773 rows=1657975 loops=7)
> Planning Time: 1.386 ms
> Execution Time: 45340.324 ms
> (24 rows)
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message insaf.k 2018-04-27 09:59:24 Issues while building PG in MS Windows, using MSYS2 and MinGW-w64
Previous Message Hartmut Holzgraefe 2018-04-27 09:46:25 Re: Is a modern build system acceptable for older platforms