| From: | Richard Guo <guofenglinux(at)gmail(dot)com> |
|---|---|
| To: | Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Wrong results in remove_useless_groupby_columns() |
| Date: | 2026-05-05 10:12:16 |
| Message-ID: | CAMbWs49t6uArWoTT-cHY+nhsi23nJJKcF9Xb9cYGzaZ9kNJ98g@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
I seem to have fallen into a rabbit-hole around equality-relation
issues. I noticed another wrong-result issue caused by collation
mismatch. This time it is in remove_useless_groupby_columns().
remove_useless_groupby_columns() uses a relation's unique indexes to
prove that some GROUP BY columns are functionally dependent on others,
and so can be dropped from the GROUP BY clause. The match between
index columns and GROUP BY columns is done by attno alone.
However, a unique index under one collation does not prove uniqueness
under another. For example:
create collation ci (provider = icu, locale = 'und-u-ks-level2',
deterministic = false);
create table t (a text collate ci not null, b text);
insert into t values ('foo','X'), ('FOO','Y');
create unique index on t (a collate "C");
-- wrong results: should be 2 rows
select a, b from t group by a, b;
a | b
-----+---
foo | X
(1 row)
The wrong-result is caused by the planner incorrectly dropping b from
GROUP BY.
What is more, remove_useless_groupby_columns() does not bother to
check opfamily either. A unique index under one opfamily does not
prove uniqueness under the equality used by GROUP BY when the
SortGroupClause's eqop comes from a different opfamily. For example:
create type myrec as (x numeric);
create table opf_t (a myrec not null, b text);
create unique index on opf_t (a record_image_ops);
insert into opf_t values (row(1.0)::myrec, 'X'), (row(1.00)::myrec, 'Y');
-- wrong results: should be 2 rows
select a, b from opf_t group by a, b;
a | b
-------+---
(1.0) | X
(1 row)
ROW(1.0) and ROW(1.00) are bytewise distinct, so record_image_ops
admits both rows. But GROUP BY a uses the type's default record_ops,
under which they are logically equal, so the two rows merge into one
group. And again, the planner drops b and a row is silently lost.
Attached patch fixes both issues.
- Richard
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Consider-opfamily-and-collation-when-removing-red.patch | application/octet-stream | 12.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Etsuro Fujita | 2026-05-05 10:19:54 | Re: Use-after-free issue in postgres_fdw |
| Previous Message | Jim Jones | 2026-05-05 10:06:44 | Re: COPY ON_CONFLICT TABLE; save duplicated record to another table. |