Re: Re: fix cost subqueryscan wrong parallel cost

From: "bucoo(at)sohu(dot)com" <bucoo(at)sohu(dot)com>
To: robertmhaas <robertmhaas(at)gmail(dot)com>
Cc: "Richard Guo" <guofenglinux(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: fix cost subqueryscan wrong parallel cost
Date: 2022-04-20 14:00:46
Message-ID: 2022042022004640700325@sohu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Sure, but that doesn't make the patch correct. The patch proposes
> that, when parallelism in use, a subquery scan will produce fewer rows
> than when parallelism is not in use, and that's 100% false. Compare
> this with the case of a parallel sequential scan. If a table contains
> 1000 rows, and we scan it with a regular Seq Scan, the Seq Scan will
> return 1000 rows. But if we scan it with a Parallel Seq Scan using
> say 4 workers, the number of rows returned in each worker will be
> substantially less than 1000, because 1000 is now the *total* number
> of rows to be returned across *all* processes, and what we need is the
> number of rows returned in *each* process.

for now fuction cost_subqueryscan always using *total* rows even parallel
path. like this:

Gather (rows=30000)
Workers Planned: 2
-> Subquery Scan (rows=30000) -- *total* rows, should be equal subpath
-> Parallel Seq Scan (rows=10000)

Maybe the codes:

/* Mark the path with the correct row estimate */
if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = baserel->rows;

should change to:

/* Mark the path with the correct row estimate */
if (path->path.parallel_workers > 0)
path->path.rows = path->subpath->rows;
else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = baserel->rows;

bucoo(at)sohu(dot)com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-04-20 14:06:06 Re: generalized conveyor belt storage
Previous Message David G. Johnston 2022-04-20 13:50:36 Re: Odd off-by-one dirty buffers and checkpoint buffers written