| From: | Matheus Alcantara <matheusssilv97(at)gmail(dot)com> |
|---|---|
| To: | jian he <jian(dot)universality(at)gmail(dot)com> |
| Cc: | pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Enable partitionwise join for partition keys wrapped by RelabelType |
| Date: | 2026-02-25 18:51:02 |
| Message-ID: | 0132f1c4-f701-4e2d-9022-e3e95cdb01d5@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 25/02/26 09:46, jian he wrote:
> On Tue, Jan 27, 2026 at 11:42 PM Matheus Alcantara
> <matheusssilv97(at)gmail(dot)com> wrote:
>>
>> Although this make sense to me I see difference in row estimation using
>> your v2 patch for the following example:
>>
>> ...
>>
>>
>> V1 patch:
>>
>> postgres=# select * from check_estimated_rows($$ SELECT FROM t1, t2 WHERE t1.c = t2.c GROUP BY t1.c, t2.c $$);
>> estimated | actual
>> -----------+--------
>> 12 | 12
>>
>> V2 patch:
>>
>> postgres=# select * from check_estimated_rows($$ SELECT FROM t1, t2 WHERE t1.c = t2.c GROUP BY t1.c, t2.c $$);
>> estimated | actual
>> -----------+--------
>> 144 | 12
>>
>> I've also tried to make the partitions of t1 and t2 as foreign tables
>> and I got the same row estimation difference.
>>
>> I'm just wondering if we are missing something?
>>
> Hi.
>
> in function process_equivalence, we have:
> /*
> * Ensure both input expressions expose the desired collation (their types
> * should be OK already); see comments for canonicalize_ec_expression.
> */
> item1 = canonicalize_ec_expression(item1,
> exprType((Node *) item1),
> collation);
> item2 = canonicalize_ec_expression(item2,
> exprType((Node *) item2),
> collation);
>
>
> Let's simplify the test case.
> CREATE COLLATION case_insensitive (provider = icu, locale =
> '@colStrength=secondary', deterministic = false);
> CREATE DOMAIN d_txt1 AS text collate case_insensitive;
> CREATE TABLE t3 (a int, b int, c text);
> INSERT INTO t3 SELECT i % 12, i, to_char(i/50, 'FM0000') FROM
> generate_series(0, 599, 2) i;
> ANALYZE t3;
> CREATE TABLE t4 (a int, b int, c d_txt1);
> INSERT INTO t4 SELECT i % 10, i, to_char(i/50, 'FM0000') FROM
> generate_series(0, 599, 3) i;
> ANALYZE t4;
> EXPLAIN SELECT FROM t3, t4 WHERE t3.c = t4.c GROUP BY t3.c, t4.c;
>
> The ``WHERE t3.c = t4.c `` after the function process_equivalence, it will
> produce 2 RELABELTYPE node in EquivalenceClass->ec_members->em_expr and your v1
> uncondition strip these two RELABELTYPE, exprs_known_equal will retrun true,
> therefore for numdistinct it will think "GROUP BY t3.c, t4.c" is the same as
> ""GROUP BY t3.c".
>
> However if we not unconditionly strip RELABELTYPE, exprs_known_equal will return
> false, therefore "GROUP BY t3.c, t4.c", "t3.c", "t4.c" is not identical, so it
> multiply these two distinct numbers. IMHO, That's the reason for
> estimate number 144 versus 12.
>
Ok, that make sense to me, thanks for pointing this out.
So do you think that v3 attached on [1] correctly address this?
> Please also see the comments in canonicalize_ec_expression.
> Actually, I think the comments in canonicalize_ec_expression discourage strip
> RelabelType nodes when RelabelType->resultcollid differs from the collation of
> RelabelType->arg.
>
Agree, thanks.
[1]
https://www.postgresql.org/message-id/DFZHIGROJHVS.25OYGENTHBLSM%40gmail.com
--
Matheus Alcantara
EDB: https://www.enterprisedb.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Masahiko Sawada | 2026-02-25 19:03:08 | Re: Initial COPY of Logical Replication is too slow |
| Previous Message | Tom Lane | 2026-02-25 18:32:58 | Re: AIX support |