| From: | wenhui qiu <qiuwenhuifx(at)gmail(dot)com> |
|---|---|
| To: | Richard Guo <guofenglinux(at)gmail(dot)com> |
| Cc: | Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Fix HAVING-to-WHERE pushdown with nondeterministic collations |
| Date: | 2026-04-01 02:19:09 |
| Message-ID: | CAGjGUA+zH9co3x-vGTDUp2h9sS0NQ9XPF9ZWL_D-aP1KE9BpKg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Richard
> + if (OidIsValid(inputcollid))
> + {
> + List *vars;
> +
> + /*
> + * We use PVC_RECURSE_PLACEHOLDERS because PlaceHolderVars may have
> + * been introduced by pull_up_subqueries, and we need to look through
> + * them to find the underlying Vars. We don't need to consider
> + * Aggrefs because clauses containing aggregates are already excluded
> + * from HAVING-to-WHERE pushdown by the contain_agg_clause check.
> + * Likewise, WindowFuncs are ignored since they cannot appear in a
> + * HAVING clause.
> + */
> + vars = pull_var_clause(node, PVC_RECURSE_PLACEHOLDERS);
> +
> + foreach_node(Var, var, vars)
> + {
> + if (var->varno == *group_rtindex &&
> + OidIsValid(var->varcollid) &&
> + var->varcollid != inputcollid &&
> + !get_collation_isdeterministic(var->varcollid))
> + {
> + list_free(vars);
> + return true;
> + }
> + }
> +
> + list_free(vars);
> + }
> +
> + return expression_tree_walker(node, having_collation_conflict_walker,
> + group_rtindex);
> +}
This might be overthinking, but I wonder if calling pull_var_clause() at
each walker step could introduce some overhead due to repeated subtree scans
,Should we add a test (SELECT x, count(*) FROM test3ci GROUP BY x HAVING
max(x) = 'abc' COLLATE case_sensitive;)
Thanks
On Tue, Mar 31, 2026 at 11:41 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> As briefly discussed on Discord, when a GROUP BY clause uses a
> nondeterministic collation, the planner's optimization of moving
> HAVING clauses to WHERE can produce incorrect results if the HAVING
> clause applies a stricter collation.
>
> CREATE TABLE t (x TEXT COLLATE case_insensitive);
> INSERT INTO t VALUES ('a'), ('A');
>
> SELECT x, count(*) FROM t GROUP BY x HAVING x = 'a' COLLATE "C";
>
> This returns count=1, but should return count=2.
>
> The attached draft patch fixes this for HEAD by leveraging GROUP Vars
> (Vars referencing RTE_GROUP) to detect collation conflicts on a
> per-clause basis, so only unsafe clauses are kept in HAVING while safe
> ones are still pushed. Please see the commit message for more
> details.
>
> For versions prior to v18, we do not have GROUP Vars. I wonder if we
> can take a conservative approach: skipping the HAVING-to-WHERE
> pushdown optimization entirely if any GROUP BY expression uses a
> nondeterministic collation.
>
> Thoughts and reviews are welcome.
>
> - Richard
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Raghav Mittal | 2026-04-01 02:22:49 | Proposal: Track last-used timestamp for index usage |
| Previous Message | Amit Kapila | 2026-04-01 02:15:38 | Re: Skipping schema changes in publication |