Re: fix cost subqueryscan wrong parallel cost

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: fix cost subqueryscan wrong parallel cost
Date: 2022-04-29 19:06:58
Message-ID: CAKFQuwYqXCS=Hu4=kXmKwactpqK2v9cqJifz1gWX-RniFJRnnw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 29, 2022 at 11:09 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>
> In short, these SubqueryScans are being labeled as producing 60000 rows
> when their input only produces 25000 rows, which is surely insane.
>
> So: even though the SubqueryScan itself isn't parallel-aware, the number
> of rows it processes has to be de-rated according to the number of workers
> involved.

Right, so why does baserel.rows show 60,000 here when path->subpath->rows
only shows 25,000? Because if you substitute path->subpath->rows for
baserel.rows in cost_subquery you get (with your cost change above):

Incremental Sort (cost=27875.50..45577.57 rows=120000 width=12) (actual
time=165.285..235.749 rows=60000 loops=1)
Sort Key: "*SELECT* 1".a, "*SELECT* 1".c
Presorted Key: "*SELECT* 1".a
Full-sort Groups: 10 Sort Method: quicksort Average Memory: 28kB Peak
Memory: 28kB
Pre-sorted Groups: 10 Sort Method: quicksort Average Memory: 521kB
Peak Memory: 521kB
-> Unique (cost=27794.85..28994.85 rows=120000 width=12) (actual
time=157.882..220.501 rows=60000 loops=1)
-> Sort (cost=27794.85..28094.85 rows=120000 width=12) (actual
time=157.881..187.232 rows=120000 loops=1)
Sort Key: "*SELECT* 1".a, "*SELECT* 1".b, "*SELECT* 1".c
Sort Method: external merge Disk: 2600kB
-> Gather (cost=0.00..1400.00 rows=120000 width=12)
(actual time=0.197..22.705 rows=120000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Append (cost=0.00..1400.00 rows=50000
width=12) (actual time=0.015..13.101 rows=40000 loops=3)
-> Subquery Scan on "*SELECT* 1"
(cost=0.00..575.00 rows=25000 width=12) (actual time=0.014..6.864
rows=30000 loops=2)
-> Parallel Seq Scan on t
(cost=0.00..575.00 rows=25000 width=12) (actual time=0.014..3.708
rows=30000 loops=2)
-> Subquery Scan on "*SELECT* 2"
(cost=0.00..575.00 rows=25000 width=12) (actual time=0.010..6.918
rows=30000 loops=2)
-> Parallel Seq Scan on t t_1
(cost=0.00..575.00 rows=25000 width=12) (actual time=0.010..3.769
rows=30000 loops=2)
Planning Time: 0.137 ms
Execution Time: 239.958 ms
(19 rows)

Which shows your 1400 cost goal from union all, and the expected row
counts, for gather-atop-append.

The fact that (baserel.rows > path->subpath->rows) here seems like a
straight bug: there are no filters involved in this case but in the
presence of filters baserel->rows should be strictly (<=
path->subpath->rows), right?

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-04-29 19:18:15 Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module
Previous Message Cary Huang 2022-04-29 18:35:52 Re: allow specifying action when standby encounters incompatible parameter settings