| From: | wenhui qiu <qiuwenhuifx(at)gmail(dot)com> |
|---|---|
| To: | Richard Guo <guofenglinux(at)gmail(dot)com> |
| Cc: | Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs |
| Date: | 2026-04-13 03:27:39 |
| Message-ID: | CAGjGUAK7cmV3u6YE3OBQCuhDh6hrhwY=meaEch0VDT3Kn32WDw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
HI Richard
> + /* Convert negative stadistinct to absolute count */
> + if (stats->stadistinct < 0)
> + {
> + RelOptInfo *baserel = find_base_rel(subroot, var->varno);
> +
> + if (baserel->tuples > 0)
> + {
> + stats->stadistinct = (float4)
> + clamp_row_est(-stats->stadistinct * baserel->tuples);
> + }
> + }
Thanks so much for working on this! While looking at the negative
stadistinct conversion, I was wondering if we might run into a potential
edge case with multi-level nested subqueries. What do you think?
/* Convert negative stadistinct to absolute count */
if (stats->stadistinct < 0)
{
- RelOptInfo *baserel = find_base_rel(subroot, var->varno);
+ RelOptInfo *baserel = vardata->rel;
- if (baserel->tuples > 0)
+ if (baserel && baserel->tuples > 0)
{
stats->stadistinct = (float4)
clamp_row_est(-stats->stadistinct * baserel->tuples);
}
}
Thanks
On Mon, Apr 13, 2026 at 9:19 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> I complained in [1] that some TPC-DS queries suffer from very poor
> cardinality estimates on CTE scan filters, to the point that simply
> disabling nestloop makes some queries run hundreds of times faster.
> Here's a simple reproduction:
>
> create table t (a int, b int, c int);
> insert into t select i%2, i, i from generate_series(1,1000) i;
> analyze t;
>
> explain analyze
> with cte as (select a, b, avg(c) as avg from t group by a, b)
> select * from cte t1, cte t2
> where t1.a = 1 and t2.a = 1 and t1.avg = t2.avg;
>
> Column 'a' has only 2 distinct values, so the filter a=1 on the
> 1000-row CTE output should estimate ~500 rows (assuming these values
> are equally common). Instead, the CTE scan estimates 5 rows (1000 *
> 1/200) because examine_simple_variable returns early when the subquery
> has GROUP BY, and selectivity estimation falls back on
> 1/DEFAULT_NUM_DISTINCT.
>
> -> CTE Scan on cte t1 (cost=0.00..22.50 rows=5 width=40)
> (actual time=4.874..5.053 rows=500.00 loops=1)
>
> As a result, this query ends up with a Nested Loop plan, and the
> Execution Time is 192.907 ms.
>
> For DISTINCT or GROUP BY key columns that are simple Vars, I think we
> can propagate stadistinct from the base table, because the set of
> distinct values is preserved after grouping. MCV frequencies,
> histograms, and correlation data are not valid since GROUP BY and
> DISTINCT change the frequency distribution, but with stadistinct
> alone, callers like var_eq_const() can use a 1/ndistinct estimate
> rather than 1/DEFAULT_NUM_DISTINCT.
>
> Attached is a patch to do this. With the patch, the example above
> estimates 500 rows ...
>
> -> CTE Scan on cte t1 (cost=0.00..22.50 rows=500 width=40)
> (actual time=3.785..4.143 rows=500.00 loops=1)
>
> ... and chooses a Hash Join, with an Execution Time of 8.238 ms (~20x
> faster).
>
> I tested this patch on TPC-DS query 31:
>
> -- on master:
> Planning Time: 5.207 ms
> Execution Time: 1536140.258 ms
>
> -- on patched:
> Planning Time: 5.140 ms
> Execution Time: 1149.482 ms
>
> Over 1300x faster.
>
> Does this approach make sense? Any thoughts?
>
> [1]
> https://postgr.es/m/CAMbWs4-QU_nkFqFZLdzWRsEsVE8aLWx4qBBVq7g4rXw+cvYDMg@mail.gmail.com
>
> - Richard
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2026-04-13 03:52:59 | Re: test_compression, test module for low-level compression APIs (for 2b5ba2a0a141) |
| Previous Message | vignesh C | 2026-04-13 03:21:06 | Re: EXCEPT TABLE - Case inconsistency for describe \d and \dRp+ |