Re: Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Propagate stadistinct through GROUP BY/DISTINCT in subqueries and CTEs
Date: 2026-04-14 06:11:06
Message-ID: CAMbWs4-Nib006vz1y9i5xE-0of9c-27oGyVS88uZgmOmPqST5w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 13, 2026 at 12:27 PM wenhui qiu <qiuwenhuifx(at)gmail(dot)com> wrote:

> Thanks so much for working on this! While looking at the negative stadistinct conversion, I was wondering if we might run into a potential edge case with multi-level nested subqueries. What do you think?
>
> /* Convert negative stadistinct to absolute count */
>
> if (stats->stadistinct < 0)
> {
> - RelOptInfo *baserel = find_base_rel(subroot, var->varno);
> + RelOptInfo *baserel = vardata->rel;
>
> - if (baserel->tuples > 0)
> + if (baserel && baserel->tuples > 0)
> {
> stats->stadistinct = (float4)
> clamp_row_est(-stats->stadistinct * baserel->tuples);
> }
> }

I don't think your proposed change would work. vardata->rel is the
CTE/subquery scan rel in the outer query, and its tuples count is the
CTE's output row count, not the base table's. Using it would be
equivalent to not converting at all, since get_variable_numdistinct()
already computes -stadistinct * vardata->rel->tuples. What we need
here is the base table's rel in the subroot, which gives us the
correct rowcount for interpreting the negative fraction.

- Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2026-04-14 06:19:46 Re: Improve logical replication usability when tables lack primary keys
Previous Message vignesh C 2026-04-14 05:51:40 Re: Add missing period to DETAIL messages