Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, rushabh(dot)lathia(at)gmail(dot)com
Subject: Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Date: 2020-04-15 00:18:48
Message-ID: CAKU4AWpXSryGjHO6Mgs5GcJVQyE725SN9SJujQb4VsLhVRTAQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi David:

Thanks for your time.

> 1. Out of date comment in join.sql
>
> -- join removal is not possible when the GROUP BY contains a column that is
> -- not in the join condition. (Note: as of 9.6, we notice that b.id is a
> -- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
> -- but this happens too late for join removal in the outer plan level.)
> explain (costs off)
> select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
> on d.a = s.d;
>
> You've changed the GROUP BY clause so it does not include b.id, so the
> Note in the comment is now misleading.
>

Thanks, I will fix this one in the following patch.

>
> 2. I think 0002 is overly restrictive in its demands that
> parse->hasAggs must be false. We should be able to just use a Group
> Aggregate with unsorted input when the input_rel is unique on the
> GROUP BY clause. This will save on hashing and sorting. Basically
> similar to what we do for when a query contains aggregates without any
> GROUP BY.
>
>
Yes, This will be a perfect result, the difficult is the current
aggregation function
execution is highly coupled with Agg node(ExecInitAgg) which is removed in
the
unique case. I ever make the sum (w/o finalfn) and avg(with finalfn)
works in a hack way, but still many stuffs is not handled. Let me prepare
the code
for this purpose in 1~2 days to see if I'm going with the right direction.

Ashutosh also has an idea[1] that if the relation underlying an Agg node is
known to be unique for given groupByClause, we could safely use
AGG_SORTED strategy. Though the input is not ordered, it's sorted thus for
every row Agg
node will combine/finalize the aggregate result.

I will target the perfect result first and see how many effort do we need,
if not,
I will try Ashutosh's suggestion.

> 3. I don't quite understand why you changed this to a right join:
>
> -- Test case where t1 can be optimized but not t2
> explain (costs off) select t1.*,t2.x,t2.z
> -from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
> +from t1 right join t2 on t1.a = t2.x and t1.b = t2.y
>
> Perhaps this change is left over from some previous version of the patch?
>

This is on purpose. the original test case is used to test we can short
the group key for t1 but not t2 for aggregation, but if I keep the inner
join, the
aggnode will be removed totally, so I have to change it to right join in
order
to keep the aggnode. The full test case is:

-- Test case where t1 can be optimized but not t2

explain (costs off) select t1.*,t2.x,t2.z

from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y

group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;

where (a, b) is the primary key of t1.

[1]
https://www.postgresql.org/message-id/CAExHW5sY%2BL6iZ%3DrwnL7n3jET7aNLCNQimvfcS7C%2B5wmdjmdPiw%40mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2020-04-15 00:55:11 Re: Perl modules for testing/viewing/corrupting/repairing your heap files
Previous Message Tom Lane 2020-04-15 00:15:53 Re: wrong relkind error messages