Re: Enable partitionwise join for partition keys wrapped by RelabelType

From: "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com>
To: "jian he" <jian(dot)universality(at)gmail(dot)com>, "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enable partitionwise join for partition keys wrapped by RelabelType
Date: 2026-01-27 15:42:10
Message-ID: DFZHIGROJHVS.25OYGENTHBLSM@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu Jan 15, 2026 at 12:30 AM -03, jian he wrote:
> On Mon, Dec 15, 2025 at 10:46 PM Matheus Alcantara
> <matheusssilv97(at)gmail(dot)com> wrote:
>> The function exprs_known_equal() is used by the planner to determine if
>> two expressions are semantically equivalent, often by checking if they
>> belong to the same Equivalence Class (EC).
>>
> src/include/nodes/primnodes.h CollateExpr comments:
> /*----------
> * CollateExpr - COLLATE
> *
> * The planner replaces CollateExpr with RelabelType during expression
> * preprocessing, so execution never sees a CollateExpr.
> *----------
> */
>
> examine_variable handling RelabelType (transformed from CollateExpr) is wrong, i
> think. Roughly speaking it will reduce "t2.c collate case_insensitive" to
> "t2.c". see [1].
>
> If examine_variable does not strip the RelabelType(CollateExpr) node, then
> estimate_num_groups->add_unique_group_var may need to deal with RelabelType.
>
> The estimate_num_groups function should account for collation settings. "GROUP
> BY a" and "GROUP BY a COLLATE case_insensitive" may result in different row
> estimates and should be handled distinctly. Therefore exprs_known_equal within
> add_unique_group_var must ensure collation is compared before assuming equality.
>
> In this context, while strippping RelabelType, we should ensure
> exprCollation(RelabelType->arg) is the same as the RelabelType->resultcollid.
>
> However, it does not affect partitionwise join, because commit [2] already fixed
> the collation issue. so we don't have to worry about
> have_partkey_equi_join->exprs_known_equal code path for the RelabelType node.
>
> so i change exprs_known_equal to:
>
> + /* Remove any relabel decorations if collation match */
> + if (IsA(expr, RelabelType))
> + {
> + RelabelType *relabel = castNode(RelabelType, expr);
> + Expr *rexpr = (Expr *) relabel;
> +
> + while (rexpr && IsA(rexpr, RelabelType))
> + rexpr = ((RelabelType *) rexpr)->arg;
> +
> + if (exprCollation((Node *) rexpr) == relabel->resultcollid)
> + expr = rexpr;
> + }
>
> [1] https://postgr.es/m/CACJufxGLCiyhM+P0gxesg2x--PTrMY3PszqSqOq_H4QS_oq3Jg@mail.gmail.com
> [2] https://git.postgresql.org/cgit/postgresql.git/commit/?id=075acdd93388c080c0fb0aca5723144ad7a56dac
>
Thanks for taking a look at this.

Although this make sense to me I see difference in row estimation using
your v2 patch for the following example:

CREATE COLLATION case_insensitive (provider = icu, locale = '@colStrength=secondary', deterministic = false);
CREATE DOMAIN d_txt1 AS text collate case_insensitive;
CREATE TABLE t1 (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE t1_p1 PARTITION OF t1 FOR VALUES IN ('0000', '0003', '0004', '0010');
CREATE TABLE t1_p2 PARTITION OF t1 FOR VALUES IN ('0001', '0005', '0002', '0009');
CREATE TABLE t1_p3 PARTITION OF t1 FOR VALUES IN ('0006', '0007', '0008', '0011');
INSERT INTO t1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 599, 2) i;
ANALYZE t1;

CREATE TABLE t2 (a int, b int, c d_txt1) PARTITION BY LIST(c);
CREATE TABLE t2_p1 PARTITION OF t2 FOR VALUES IN ('0000', '0003', '0004', '0010');
CREATE TABLE t2_p2 PARTITION OF t2 FOR VALUES IN ('0001', '0005', '0002', '0009');
CREATE TABLE t2_p3 PARTITION OF t2 FOR VALUES IN ('0006', '0007', '0008', '0011');
INSERT INTO t2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 599, 3) i;
ANALYZE t2;

SET enable_partitionwise_join TO true;

V1 patch:

postgres=# select * from check_estimated_rows($$ SELECT FROM t1, t2 WHERE t1.c = t2.c GROUP BY t1.c, t2.c $$);
estimated | actual
-----------+--------
12 | 12

V2 patch:

postgres=# select * from check_estimated_rows($$ SELECT FROM t1, t2 WHERE t1.c = t2.c GROUP BY t1.c, t2.c $$);
estimated | actual
-----------+--------
144 | 12

I've also tried to make the partitions of t1 and t2 as foreign tables
and I got the same row estimation difference.

I'm just wondering if we are missing something?

About the v2 patch:

+ if (IsA(expr, RelabelType))
+ {
+ RelabelType *relabel = castNode(RelabelType, expr);
+ Expr *rexpr = (Expr *) relabel;
+
+ while (rexpr && IsA(rexpr, RelabelType))
+ rexpr = ((RelabelType *) rexpr)->arg;
+
+ if (exprCollation((Node *) rexpr) == relabel->resultcollid)
+ expr = rexpr;

I think that unwrapping all Relabel types may ignore intermediate
states. For example, consider Relabel A -> Relabel B -> Base expression:
This code will unwrap Relabel A and Relabel B and check the expression
collation directly on Base expression, shouldn't we check every layer,
e.g Relabel A, Relabel B and Base Expression? Please see the attached v3
version with a simplified version of v2 that also check every layer of a
RelabelType node.

--
Matheus Alcantara
EDB: https://www.enterprisedb.com

Attachment Content-Type Size
v3-0001-Enable-partitionwise-join-for-partition-keys-wrap.patch text/plain 6.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ilia Evdokimov 2026-01-27 15:43:00 Re: Hash-based MCV matching for large IN-lists
Previous Message Mahendra Singh Thalor 2026-01-27 15:40:50 Re: Non-text mode for pg_dumpall