Re: Enable partitionwise join for partition keys wrapped by RelabelType

From: Matheus Alcantara <matheusssilv97(at)gmail(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Enable partitionwise join for partition keys wrapped by RelabelType
Date: 2026-02-25 18:51:02
Message-ID: 0132f1c4-f701-4e2d-9022-e3e95cdb01d5@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25/02/26 09:46, jian he wrote:
> On Tue, Jan 27, 2026 at 11:42 PM Matheus Alcantara
> <matheusssilv97(at)gmail(dot)com> wrote:
>>
>> Although this make sense to me I see difference in row estimation using
>> your v2 patch for the following example:
>>
>> ...
>>
>>
>> V1 patch:
>>
>> postgres=# select * from check_estimated_rows($$ SELECT FROM t1, t2 WHERE t1.c = t2.c GROUP BY t1.c, t2.c $$);
>> estimated | actual
>> -----------+--------
>> 12 | 12
>>
>> V2 patch:
>>
>> postgres=# select * from check_estimated_rows($$ SELECT FROM t1, t2 WHERE t1.c = t2.c GROUP BY t1.c, t2.c $$);
>> estimated | actual
>> -----------+--------
>> 144 | 12
>>
>> I've also tried to make the partitions of t1 and t2 as foreign tables
>> and I got the same row estimation difference.
>>
>> I'm just wondering if we are missing something?
>>
> Hi.
>
> in function process_equivalence, we have:
> /*
> * Ensure both input expressions expose the desired collation (their types
> * should be OK already); see comments for canonicalize_ec_expression.
> */
> item1 = canonicalize_ec_expression(item1,
> exprType((Node *) item1),
> collation);
> item2 = canonicalize_ec_expression(item2,
> exprType((Node *) item2),
> collation);
>
>
> Let's simplify the test case.
> CREATE COLLATION case_insensitive (provider = icu, locale =
> '@colStrength=secondary', deterministic = false);
> CREATE DOMAIN d_txt1 AS text collate case_insensitive;
> CREATE TABLE t3 (a int, b int, c text);
> INSERT INTO t3 SELECT i % 12, i, to_char(i/50, 'FM0000') FROM
> generate_series(0, 599, 2) i;
> ANALYZE t3;
> CREATE TABLE t4 (a int, b int, c d_txt1);
> INSERT INTO t4 SELECT i % 10, i, to_char(i/50, 'FM0000') FROM
> generate_series(0, 599, 3) i;
> ANALYZE t4;
> EXPLAIN SELECT FROM t3, t4 WHERE t3.c = t4.c GROUP BY t3.c, t4.c;
>
> The ``WHERE t3.c = t4.c `` after the function process_equivalence, it will
> produce 2 RELABELTYPE node in EquivalenceClass->ec_members->em_expr and your v1
> uncondition strip these two RELABELTYPE, exprs_known_equal will retrun true,
> therefore for numdistinct it will think "GROUP BY t3.c, t4.c" is the same as
> ""GROUP BY t3.c".
>
> However if we not unconditionly strip RELABELTYPE, exprs_known_equal will return
> false, therefore "GROUP BY t3.c, t4.c", "t3.c", "t4.c" is not identical, so it
> multiply these two distinct numbers. IMHO, That's the reason for
> estimate number 144 versus 12.
>

Ok, that make sense to me, thanks for pointing this out.

So do you think that v3 attached on [1] correctly address this?

> Please also see the comments in canonicalize_ec_expression.
> Actually, I think the comments in canonicalize_ec_expression discourage strip
> RelabelType nodes when RelabelType->resultcollid differs from the collation of
> RelabelType->arg.
>

Agree, thanks.

[1]
https://www.postgresql.org/message-id/DFZHIGROJHVS.25OYGENTHBLSM%40gmail.com

--
Matheus Alcantara
EDB: https://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2026-02-25 19:03:08 Re: Initial COPY of Logical Replication is too slow
Previous Message Tom Lane 2026-02-25 18:32:58 Re: AIX support