Re: [PATCH] Erase the distinctClause if the result is unique by definition

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Erase the distinctClause if the result is unique by definition
Date: 2020-03-10 13:50:23
Message-ID: CAExHW5u9VMU4nq1Z4pTYTfngFd1BYi0pwmmV6a3OB47T=XaBTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andy,

On Tue, Mar 10, 2020 at 1:49 PM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
>
> Hi David:
>
>>
>> 3. Perhaps in add_paths_to_joinrel(), or maybe when creating the join
>> rel itself (I've not looked for the best location in detail),
>> determine if the join can cause rows to be duplicated. If it can't,
>> then add the UniqueKeys from that rel.
>
>
> I have some concerns about this method, maybe I misunderstand
> something, if so, please advise.
>
> In my current implementation, it calculates the uniqueness for each
> BaseRel only, but in your way, looks we need to calculate the
> UniquePathKey for both BaseRel and JoinRel. This makes more
> difference for multi table join.

I didn't understand this concern. I think, it would be better to do it
for all kinds of relation types base, join etc. This way we are sure
that one method works across the planner to eliminate the need for
Distinct or grouping. If we just implement something for base
relations right now and don't do that for joins, there is a chance
that that method may not work for joins when we come to implement it.

> Another concern is UniquePathKey
> is designed for a general purpose, we need to maintain it no matter
> distinctClause/groupbyClause.

This should be ok. The time spent in annotating a RelOptInfo about
uniqueness is not going to be a lot. But doing so would help generic
elimination of Distinct/Group/Unique operations. What is
UniquePathKey; I didn't find this in your patch or in the code.

>
>
>>
>> For example: SELECT * FROM t1
>> INNER JOIN t2 ON t1.unique = t2.not_unique; would have the joinrel for
>> {t1,t2} only take the unique keys from t2 (t1 can't duplicate t2 rows
>> since it's an eqijoin and t1.unique has a unique index).
>
>
> Thanks for raising this. My current rule requires *every* relation yields a
> unique result and *no matter* with the join method. Actually I want to make
> the rule less strict, for example, we may just need 1 table yields unique result
> and the final result will be unique as well under some join type.

That is desirable.

>
> As for the t1 INNER JOIN t2 ON t1.unique = t2.not_unique; looks we can't
> remove the distinct based on this.
>
> create table m1(a int primary key, b int);
> create table m2(a int primary key, b int);
> insert into m1 values(1, 1), (2, 1);
> insert into m2 values(1, 1), (2, 1);
> select distinct m1.a from m1, m2 where m1.a = m2.b;

IIUC, David's rule is other way round. "select distinct m2.a from m1,
m2 where m1.a = m2.b" won't need DISTINCT node since the result of
joining m1 and m2 has unique value of m2.a for each row. In your
example the join will produce two rows (m1.a, m1.b, m2.a, m2.b) (1, 1,
1, 1) and (1, 1, 2, 1) where m2.a is unique key.

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2020-03-10 13:53:36 Re: Add an optional timeout clause to isolationtester step.
Previous Message Peter Eisentraut 2020-03-10 13:47:47 Re: Improve handling of parameter differences in physical replication