| From: | "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com> |
|---|---|
| To: | "jian he" <jian(dot)universality(at)gmail(dot)com>, "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com> |
| Cc: | <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Enable partitionwise join for partition keys wrapped by RelabelType |
| Date: | 2026-01-27 15:42:10 |
| Message-ID: | DFZHIGROJHVS.25OYGENTHBLSM@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thu Jan 15, 2026 at 12:30 AM -03, jian he wrote:
> On Mon, Dec 15, 2025 at 10:46 PM Matheus Alcantara
> <matheusssilv97(at)gmail(dot)com> wrote:
>> The function exprs_known_equal() is used by the planner to determine if
>> two expressions are semantically equivalent, often by checking if they
>> belong to the same Equivalence Class (EC).
>>
> src/include/nodes/primnodes.h CollateExpr comments:
> /*----------
> * CollateExpr - COLLATE
> *
> * The planner replaces CollateExpr with RelabelType during expression
> * preprocessing, so execution never sees a CollateExpr.
> *----------
> */
>
> examine_variable handling RelabelType (transformed from CollateExpr) is wrong, i
> think. Roughly speaking it will reduce "t2.c collate case_insensitive" to
> "t2.c". see [1].
>
> If examine_variable does not strip the RelabelType(CollateExpr) node, then
> estimate_num_groups->add_unique_group_var may need to deal with RelabelType.
>
> The estimate_num_groups function should account for collation settings. "GROUP
> BY a" and "GROUP BY a COLLATE case_insensitive" may result in different row
> estimates and should be handled distinctly. Therefore exprs_known_equal within
> add_unique_group_var must ensure collation is compared before assuming equality.
>
> In this context, while strippping RelabelType, we should ensure
> exprCollation(RelabelType->arg) is the same as the RelabelType->resultcollid.
>
> However, it does not affect partitionwise join, because commit [2] already fixed
> the collation issue. so we don't have to worry about
> have_partkey_equi_join->exprs_known_equal code path for the RelabelType node.
>
> so i change exprs_known_equal to:
>
> + /* Remove any relabel decorations if collation match */
> + if (IsA(expr, RelabelType))
> + {
> + RelabelType *relabel = castNode(RelabelType, expr);
> + Expr *rexpr = (Expr *) relabel;
> +
> + while (rexpr && IsA(rexpr, RelabelType))
> + rexpr = ((RelabelType *) rexpr)->arg;
> +
> + if (exprCollation((Node *) rexpr) == relabel->resultcollid)
> + expr = rexpr;
> + }
>
> [1] https://postgr.es/m/CACJufxGLCiyhM+P0gxesg2x--PTrMY3PszqSqOq_H4QS_oq3Jg@mail.gmail.com
> [2] https://git.postgresql.org/cgit/postgresql.git/commit/?id=075acdd93388c080c0fb0aca5723144ad7a56dac
>
Thanks for taking a look at this.
Although this make sense to me I see difference in row estimation using
your v2 patch for the following example:
CREATE COLLATION case_insensitive (provider = icu, locale = '@colStrength=secondary', deterministic = false);
CREATE DOMAIN d_txt1 AS text collate case_insensitive;
CREATE TABLE t1 (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE t1_p1 PARTITION OF t1 FOR VALUES IN ('0000', '0003', '0004', '0010');
CREATE TABLE t1_p2 PARTITION OF t1 FOR VALUES IN ('0001', '0005', '0002', '0009');
CREATE TABLE t1_p3 PARTITION OF t1 FOR VALUES IN ('0006', '0007', '0008', '0011');
INSERT INTO t1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 599, 2) i;
ANALYZE t1;
CREATE TABLE t2 (a int, b int, c d_txt1) PARTITION BY LIST(c);
CREATE TABLE t2_p1 PARTITION OF t2 FOR VALUES IN ('0000', '0003', '0004', '0010');
CREATE TABLE t2_p2 PARTITION OF t2 FOR VALUES IN ('0001', '0005', '0002', '0009');
CREATE TABLE t2_p3 PARTITION OF t2 FOR VALUES IN ('0006', '0007', '0008', '0011');
INSERT INTO t2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 599, 3) i;
ANALYZE t2;
SET enable_partitionwise_join TO true;
V1 patch:
postgres=# select * from check_estimated_rows($$ SELECT FROM t1, t2 WHERE t1.c = t2.c GROUP BY t1.c, t2.c $$);
estimated | actual
-----------+--------
12 | 12
V2 patch:
postgres=# select * from check_estimated_rows($$ SELECT FROM t1, t2 WHERE t1.c = t2.c GROUP BY t1.c, t2.c $$);
estimated | actual
-----------+--------
144 | 12
I've also tried to make the partitions of t1 and t2 as foreign tables
and I got the same row estimation difference.
I'm just wondering if we are missing something?
About the v2 patch:
+ if (IsA(expr, RelabelType))
+ {
+ RelabelType *relabel = castNode(RelabelType, expr);
+ Expr *rexpr = (Expr *) relabel;
+
+ while (rexpr && IsA(rexpr, RelabelType))
+ rexpr = ((RelabelType *) rexpr)->arg;
+
+ if (exprCollation((Node *) rexpr) == relabel->resultcollid)
+ expr = rexpr;
I think that unwrapping all Relabel types may ignore intermediate
states. For example, consider Relabel A -> Relabel B -> Base expression:
This code will unwrap Relabel A and Relabel B and check the expression
collation directly on Base expression, shouldn't we check every layer,
e.g Relabel A, Relabel B and Base Expression? Please see the attached v3
version with a simplified version of v2 that also check every layer of a
RelabelType node.
--
Matheus Alcantara
EDB: https://www.enterprisedb.com
| Attachment | Content-Type | Size |
|---|---|---|
| v3-0001-Enable-partitionwise-join-for-partition-keys-wrap.patch | text/plain | 6.7 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ilia Evdokimov | 2026-01-27 15:43:00 | Re: Hash-based MCV matching for large IN-lists |
| Previous Message | Mahendra Singh Thalor | 2026-01-27 15:40:50 | Re: Non-text mode for pg_dumpall |