| From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
|---|---|
| To: | Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Wrong results from inner-unique joins caused by collation mismatch |
| Date: | 2026-04-24 11:42:28 |
| Message-ID: | CAMbWs4_XUUSTyzCaRjUeeahWNqi=8ZOA5Q4coi8zUVEDSBkM6A@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
While working on the wrong results issue caused by collation mismatch
in GROUP BY and havingQual [1], I noticed $subject:
create collation ci (provider = icu, locale = 'und-u-ks-level2',
deterministic = false);
create table t (a text);
insert into t values ('A'), ('a');
create unique index on t (a);
-- wrong results: should be 4 rows
select * from t t1 join t t2 on t1.a = t2.a collate ci;
a | a
---+---
A | A
a | a
(2 rows)
The root cause is explained by the XXX comment in
relation_has_unique_index_for():
/*
* XXX at some point we may need to check collations here too.
* For the moment we assume all collations reduce to the same
* notion of equality.
*/
That assumption stopped being safe when nondeterministic collations
were introduced in PG 12. A unique index enforces uniqueness under
its own collation; if a query's equality clause uses a different
collation, and either side is nondeterministic, the index's uniqueness
does not imply uniqueness under the clause.
Several planner optimizations use this uniqueness proof, and all of
them can yield wrong results in this scenario. These include
inner-unique join execution, left-join removal, semijoin-to-innerjoin
reduction, and self-join elimination.
My first thought was to fix this by:
+ if (!IndexCollMatchesExprColl(ind->indexcollations[c],
+ exprInputCollation((Node *) rinfo->clause)))
+ continue;
However, this caused an unexpected plan diff in join.out where a
left-join removal over (name, text) stopped working, because name and
text use different collations. So this check is too strict: a
mismatch between two deterministic collations should be OK for
uniqueness proof, as a deterministic collation treats two strings as
equal iff they are byte-wise equal (see CREATE COLLATION).
Hence, I got attached patch. Thoughts?
[1] https://postgr.es/m/CAMbWs48Dn2wW6XM94GZsoyMiH42=KgMo+WcobPKuWvGYnWaPOQ@mail.gmail.com
- Richard
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Consider-collation-when-proving-uniqueness-from-u.patch | application/octet-stream | 9.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andrey Borodin | 2026-04-24 11:58:15 | Re: Call for Posters: PGConf.dev 2026 |
| Previous Message | Zhijie Hou (Fujitsu) | 2026-04-24 11:26:05 | RE: Parallel Apply |