From: | Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> |
---|---|
To: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, rushabh(dot)lathia(at)gmail(dot)com |
Subject: | Re: [PATCH] Keeps tracking the uniqueness with UniqueKey |
Date: | 2020-04-15 00:18:48 |
Message-ID: | CAKU4AWpXSryGjHO6Mgs5GcJVQyE725SN9SJujQb4VsLhVRTAQA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi David:
Thanks for your time.
> 1. Out of date comment in join.sql
>
> -- join removal is not possible when the GROUP BY contains a column that is
> -- not in the join condition. (Note: as of 9.6, we notice that b.id is a
> -- primary key and so drop b.c_id from the GROUP BY of the resulting plan;
> -- but this happens too late for join removal in the outer plan level.)
> explain (costs off)
> select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s
> on d.a = s.d;
>
> You've changed the GROUP BY clause so it does not include b.id, so the
> Note in the comment is now misleading.
>
Thanks, I will fix this one in the following patch.
>
> 2. I think 0002 is overly restrictive in its demands that
> parse->hasAggs must be false. We should be able to just use a Group
> Aggregate with unsorted input when the input_rel is unique on the
> GROUP BY clause. This will save on hashing and sorting. Basically
> similar to what we do for when a query contains aggregates without any
> GROUP BY.
>
>
Yes, This will be a perfect result, the difficult is the current
aggregation function
execution is highly coupled with Agg node(ExecInitAgg) which is removed in
the
unique case. I ever make the sum (w/o finalfn) and avg(with finalfn)
works in a hack way, but still many stuffs is not handled. Let me prepare
the code
for this purpose in 1~2 days to see if I'm going with the right direction.
Ashutosh also has an idea[1] that if the relation underlying an Agg node is
known to be unique for given groupByClause, we could safely use
AGG_SORTED strategy. Though the input is not ordered, it's sorted thus for
every row Agg
node will combine/finalize the aggregate result.
I will target the perfect result first and see how many effort do we need,
if not,
I will try Ashutosh's suggestion.
> 3. I don't quite understand why you changed this to a right join:
>
> -- Test case where t1 can be optimized but not t2
> explain (costs off) select t1.*,t2.x,t2.z
> -from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
> +from t1 right join t2 on t1.a = t2.x and t1.b = t2.y
>
> Perhaps this change is left over from some previous version of the patch?
>
This is on purpose. the original test case is used to test we can short
the group key for t1 but not t2 for aggregation, but if I keep the inner
join, the
aggnode will be removed totally, so I have to change it to right join in
order
to keep the aggnode. The full test case is:
-- Test case where t1 can be optimized but not t2
explain (costs off) select t1.*,t2.x,t2.z
from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
where (a, b) is the primary key of t1.
From | Date | Subject | |
---|---|---|---|
Next Message | Mark Dilger | 2020-04-15 00:55:11 | Re: Perl modules for testing/viewing/corrupting/repairing your heap files |
Previous Message | Tom Lane | 2020-04-15 00:15:53 | Re: wrong relkind error messages |