Re: Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs

From: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs
Date: 2026-04-14 07:58:51
Message-ID: CAGjGUALqgfapfAy+i7mYSq9+dOhg_4sf6C4bUhbM1+-uQqyoPw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Richard
> I don't think your proposed change would work. vardata->rel is the
> CTE/subquery scan rel in the outer query, and its tuples count is the
> CTE's output row count, not the base table's. Using it would be
> equivalent to not converting at all, since get_variable_numdistinct()
> already computes -stadistinct * vardata->rel->tuples. What we need
> here is the base table's rel in the subroot, which gives us the
> correct rowcount for interpreting the negative fraction.
Thank you for your explanation. The path LGTM

Thanks

On Tue, Apr 14, 2026 at 2:11 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:

> On Mon, Apr 13, 2026 at 12:27 PM wenhui qiu <qiuwenhuifx(at)gmail(dot)com> wrote:
>
> > Thanks so much for working on this! While looking at the negative
> stadistinct conversion, I was wondering if we might run into a potential
> edge case with multi-level nested subqueries. What do you think?
> >
> > /* Convert negative stadistinct to absolute count */
> >
> > if (stats->stadistinct < 0)
> > {
> > - RelOptInfo *baserel = find_base_rel(subroot, var->varno);
> > + RelOptInfo *baserel = vardata->rel;
> >
> > - if (baserel->tuples > 0)
> > + if (baserel && baserel->tuples > 0)
> > {
> > stats->stadistinct = (float4)
> > clamp_row_est(-stats->stadistinct * baserel->tuples);
> > }
> > }
>
> I don't think your proposed change would work. vardata->rel is the
> CTE/subquery scan rel in the outer query, and its tuples count is the
> CTE's output row count, not the base table's. Using it would be
> equivalent to not converting at all, since get_variable_numdistinct()
> already computes -stadistinct * vardata->rel->tuples. What we need
> here is the base table's rel in the subroot, which gives us the
> correct rowcount for interpreting the negative fraction.
>
> - Richard
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2026-04-14 08:08:22 meson html:alias vs. html:custom
Previous Message Amit Kapila 2026-04-14 07:51:55 Re: Support EXCEPT for TABLES IN SCHEMA publications