Re: Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs

From: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs
Date: 2026-04-13 03:27:39
Message-ID: CAGjGUAK7cmV3u6YE3OBQCuhDh6hrhwY=meaEch0VDT3Kn32WDw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

HI Richard

> + /* Convert negative stadistinct to absolute count */
> + if (stats->stadistinct < 0)
> + {
> + RelOptInfo *baserel = find_base_rel(subroot, var->varno);
> +
> + if (baserel->tuples > 0)
> + {
> + stats->stadistinct = (float4)
> + clamp_row_est(-stats->stadistinct * baserel->tuples);
> + }
> + }

Thanks so much for working on this! While looking at the negative
stadistinct conversion, I was wondering if we might run into a potential
edge case with multi-level nested subqueries. What do you think?

/* Convert negative stadistinct to absolute count */

if (stats->stadistinct < 0)
{
- RelOptInfo *baserel = find_base_rel(subroot, var->varno);
+ RelOptInfo *baserel = vardata->rel;

- if (baserel->tuples > 0)
+ if (baserel && baserel->tuples > 0)
{
stats->stadistinct = (float4)
clamp_row_est(-stats->stadistinct * baserel->tuples);
}
}

Thanks

On Mon, Apr 13, 2026 at 9:19 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:

> I complained in [1] that some TPC-DS queries suffer from very poor
> cardinality estimates on CTE scan filters, to the point that simply
> disabling nestloop makes some queries run hundreds of times faster.
> Here's a simple reproduction:
>
> create table t (a int, b int, c int);
> insert into t select i%2, i, i from generate_series(1,1000) i;
> analyze t;
>
> explain analyze
> with cte as (select a, b, avg(c) as avg from t group by a, b)
> select * from cte t1, cte t2
> where t1.a = 1 and t2.a = 1 and t1.avg = t2.avg;
>
> Column 'a' has only 2 distinct values, so the filter a=1 on the
> 1000-row CTE output should estimate ~500 rows (assuming these values
> are equally common). Instead, the CTE scan estimates 5 rows (1000 *
> 1/200) because examine_simple_variable returns early when the subquery
> has GROUP BY, and selectivity estimation falls back on
> 1/DEFAULT_NUM_DISTINCT.
>
> -> CTE Scan on cte t1 (cost=0.00..22.50 rows=5 width=40)
> (actual time=4.874..5.053 rows=500.00 loops=1)
>
> As a result, this query ends up with a Nested Loop plan, and the
> Execution Time is 192.907 ms.
>
> For DISTINCT or GROUP BY key columns that are simple Vars, I think we
> can propagate stadistinct from the base table, because the set of
> distinct values is preserved after grouping. MCV frequencies,
> histograms, and correlation data are not valid since GROUP BY and
> DISTINCT change the frequency distribution, but with stadistinct
> alone, callers like var_eq_const() can use a 1/ndistinct estimate
> rather than 1/DEFAULT_NUM_DISTINCT.
>
> Attached is a patch to do this. With the patch, the example above
> estimates 500 rows ...
>
> -> CTE Scan on cte t1 (cost=0.00..22.50 rows=500 width=40)
> (actual time=3.785..4.143 rows=500.00 loops=1)
>
> ... and chooses a Hash Join, with an Execution Time of 8.238 ms (~20x
> faster).
>
> I tested this patch on TPC-DS query 31:
>
> -- on master:
> Planning Time: 5.207 ms
> Execution Time: 1536140.258 ms
>
> -- on patched:
> Planning Time: 5.140 ms
> Execution Time: 1149.482 ms
>
> Over 1300x faster.
>
> Does this approach make sense? Any thoughts?
>
> [1]
> https://postgr.es/m/CAMbWs4-QU_nkFqFZLdzWRsEsVE8aLWx4qBBVq7g4rXw+cvYDMg@mail.gmail.com
>
> - Richard
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2026-04-13 03:52:59 Re: test_compression, test module for low-level compression APIs (for 2b5ba2a0a141)
Previous Message vignesh C 2026-04-13 03:21:06 Re: EXCEPT TABLE - Case inconsistency for describe \d and \dRp+