Re: Improve UNION's output rowcount estimate

From: Chengpeng Yan <chengpeng_yan(at)outlook(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Improve UNION's output rowcount estimate
Date: 2026-06-25 03:12:13
Message-ID: F40A744E-1671-4431-BBA8-63F9BFD86D59@outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> On Jun 22, 2026, at 10:59, Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
>
> On Mon, Jun 22, 2026 at 9:39 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>> I believe this should make the following code redundant, so shouldn't
>> the patch remove it too?
>>
>> /*
>> * Estimate the number of UNION output rows. In the case when only a
>> * single UNION child remains, we can use estimate_num_groups() on
>> * that child. We must be careful not to do this when that child is
>> * the result of some other set operation as the targetlist will
>> * contain Vars with varno==0, which estimate_num_groups() wouldn't
>> * like.
>> */
>> if (list_length(cheapest.subpaths) == 1 &&
>> first_path->parent->reloptkind != RELOPT_UPPER_REL)
>> {
>> dNumGroups = estimate_num_groups(root,
>> first_path->pathtarget->exprs,
>> first_path->rows,
>> NULL,
>> NULL);
>> }
>>
>> Then you may as well pass dNumChildGroups directly to the path
>> creation functions and get rid of your new "With multiple children,"
>> comment.
>>
>> Aside from that, I don't see any issues.
>
> Thanks for looking. You're right. build_setop_child_paths() already
> computes each child's distinct estimate, so for a single surviving
> child dNumChildGroups is exactly what that branch recomputed.
>
> (And removing it can be a slight improvement, as the old branch ran
> estimate_num_groups on the subquery-scan Vars, while
> build_setop_child_paths uses the child's own rowcount when it has
> GROUP BY/DISTINCT/aggs.)
>
> Patch updated.
>
> - Richard
> <v2-0001-Improve-UNION-s-output-row-count-estimate.patch>

Thanks for working on this. I agree this is a real problem, and
estimating UNION’s row count from the per-child distinct estimates looks
like the right direction to me.

I reviewed the v2 patch and it looks good. I also ran the regression
tests locally on my Apple Silicon machine, and they all passed.

The added regression test looks reasonable to me, since it checks the
plan change that motivated the patch. One small question: would it be
worth adding a direct row-estimate check, perhaps with a helper like
planner_est.sql’s explain_mask_costs() so that rows= stays visible while
cost/width are masked? The existing plan-shape test may already be
sufficient, though.

--
Best regards,
Chengpeng Yan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2026-06-25 03:18:32 Re: Add enable_groupagg GUC parameter to control GroupAggregate usage
Previous Message Chao Li 2026-06-25 02:57:47 Re: First draft of PG 19 release notes