Re: Fix HAVING-to-WHERE pushdown with nondeterministic collations

From: wenhui qiu <qiuwenhuifx(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix HAVING-to-WHERE pushdown with nondeterministic collations
Date: 2026-04-01 02:19:09
Message-ID: CAGjGUA+zH9co3x-vGTDUp2h9sS0NQ9XPF9ZWL_D-aP1KE9BpKg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Richard
> + if (OidIsValid(inputcollid))
> + {
> + List *vars;
> +
> + /*
> + * We use PVC_RECURSE_PLACEHOLDERS because PlaceHolderVars may have
> + * been introduced by pull_up_subqueries, and we need to look through
> + * them to find the underlying Vars. We don't need to consider
> + * Aggrefs because clauses containing aggregates are already excluded
> + * from HAVING-to-WHERE pushdown by the contain_agg_clause check.
> + * Likewise, WindowFuncs are ignored since they cannot appear in a
> + * HAVING clause.
> + */
> + vars = pull_var_clause(node, PVC_RECURSE_PLACEHOLDERS);
> +
> + foreach_node(Var, var, vars)
> + {
> + if (var->varno == *group_rtindex &&
> + OidIsValid(var->varcollid) &&
> + var->varcollid != inputcollid &&
> + !get_collation_isdeterministic(var->varcollid))
> + {
> + list_free(vars);
> + return true;
> + }
> + }
> +
> + list_free(vars);
> + }
> +
> + return expression_tree_walker(node, having_collation_conflict_walker,
> + group_rtindex);
> +}
This might be overthinking, but I wonder if calling pull_var_clause() at
each walker step could introduce some overhead due to repeated subtree scans
,Should we add a test (SELECT x, count(*) FROM test3ci GROUP BY x HAVING
max(x) = 'abc' COLLATE case_sensitive;)

Thanks

On Tue, Mar 31, 2026 at 11:41 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:

> As briefly discussed on Discord, when a GROUP BY clause uses a
> nondeterministic collation, the planner's optimization of moving
> HAVING clauses to WHERE can produce incorrect results if the HAVING
> clause applies a stricter collation.
>
> CREATE TABLE t (x TEXT COLLATE case_insensitive);
> INSERT INTO t VALUES ('a'), ('A');
>
> SELECT x, count(*) FROM t GROUP BY x HAVING x = 'a' COLLATE "C";
>
> This returns count=1, but should return count=2.
>
> The attached draft patch fixes this for HEAD by leveraging GROUP Vars
> (Vars referencing RTE_GROUP) to detect collation conflicts on a
> per-clause basis, so only unsafe clauses are kept in HAVING while safe
> ones are still pushed. Please see the commit message for more
> details.
>
> For versions prior to v18, we do not have GROUP Vars. I wonder if we
> can take a conservative approach: skipping the HAVING-to-WHERE
> pushdown optimization entirely if any GROUP BY expression uses a
> nondeterministic collation.
>
> Thoughts and reviews are welcome.
>
> - Richard
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Raghav Mittal 2026-04-01 02:22:49 Proposal: Track last-used timestamp for index usage
Previous Message Amit Kapila 2026-04-01 02:15:38 Re: Skipping schema changes in publication