incremental sort vs. gather paths

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>
Subject: incremental sort vs. gather paths
Date: 2021-12-16 16:48:19
Message-ID: CA+TgmoaEB7P+-0PAzoBxy265OWzCXy8MPokpWMbWVaVvRLV=Bw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This code introduced in ba3e76cc571eba3dea19c9465ff15ac3ac186576 looks
wrong to me:

total_groups = cheapest_partial_path->rows *
cheapest_partial_path->parallel_workers;
path = (Path *)
create_gather_merge_path(root, ordered_rel,
path,
path->pathtarget,
root->sort_pathkeys, NULL,
&total_groups);

This too:

total_groups = input_path->rows *
input_path->parallel_workers;

This came to my attention because Dave Page sent me a query plan that
looks like this:

Gather Merge (cost=22617.94..22703.35 rows=732 width=97) (actual
time=2561.476..2561.856 rows=879 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=21617.92..21618.83 rows=366 width=97) (actual
time=928.329..928.378 rows=293 loops=3)
Sort Key: aid
Sort Method: quicksort Memory: 155kB
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
-> Parallel Seq Scan on accounts_1m (cost=0.00..21602.33
rows=366 width=97) (actual time=74.391..74.518 rows=293 loops=3)
Filter: (aid < 10000000)
Rows Removed by Filter: 333040

If you look at the actual row count estimates, you see that the Gather
Merge produces 3 times the number of rows that the Parallel Seq Scan
produces, which is completely correct, because the raw number is 897
in both cases, but EXPLAIN unhelpfully divides the displayed row count
by the loop count, which in this case is 3. If you look at the
estimated row count, you see that the Gather Merge is estimated to
produce exactly 2 times the number of rows that the nodes under it
would produce. That's not a very good estimate, unless
parallel_leader_participation=off, which in this case it isn't.

What's happening here is that the actual number of rows produced by
accounts_1m is actually 879 and is estimated as 879.
get_parallel_divisor() decides to divide that number by 2.4, and gets
366, because it thinks the leader will do less work than the other
workers. Then the code above fires and, instead of letting the
original row count estimate for accounts_1m apply to the Gather Merge
path, it overrides it with 2 * 366 = 732. This cannot be right.
Really, I don't think it should be overriding the row count estimate
at all. But if it is, multiplying by the number of workers can't be
right, because the leader can also do stuff.

--
Robert Haas
EDB: http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-12-16 16:55:19 Re: Emit a warning if the extension's GUC is set incorrectly
Previous Message Shay Rojansky 2021-12-16 16:42:50 Re: Privilege required for IF EXISTS event if the object already exists