Re: allow partial union-all and improve parallel subquery costing

From: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Luc Vlaming <luc(at)swarm64(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>
Subject: Re: allow partial union-all and improve parallel subquery costing
Date: 2021-07-23 11:46:42
Message-ID: 2718932.eWMxsFeAN2@aivenronan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le lundi 12 avril 2021, 14:01:36 CEST Luc Vlaming a écrit :
> Here's an improved and rebased patch. Hope the description helps some
> people. I will resubmit it to the next commitfest.
>

Hello Luc,

I've taken a look at this patch, and while I don't fully understand its
implications here are a couple remarks.

I think you should add a test demonstrating the use of the new partial append
path you add, for example using your base query:

explain (costs off)
select sum(two) from
(
select *, 1::int from tenk1 a
union all
select *, 1::bigint from tenk1 b
) t
;

I'm not sure I understand why the subquery scan rows estimate has not been
accounted like you propose before, because the way it's done as of now
basically doubles the estimate for the subqueryscan, since we account for it
already being divided by it's number of workers, as mentioned in cost_append:

/*
* Apply parallel divisor to subpaths. Scale the number of rows
* for each partial subpath based on the ratio of the parallel
* divisor originally used for the subpath to the one we adopted.
* Also add the cost of partial paths to the total cost, but
* ignore non-partial paths for now.
*/

Do we have other nodes for which we make this assumption ?

Also, adding a partial path comprised only of underlying partial paths might
not be enough: maybe we should add one partial path even in the case of mixed
partial / nonpartial paths like it's done in add_paths_to_append_rel ?

Regards,

--
Ronan Dunklau

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2021-07-23 12:05:36 Re: truncating timestamps on arbitrary intervals
Previous Message Andrey Borodin 2021-07-23 11:32:08 Re: Avoiding data loss with synchronous replication