| From: | Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> | 
|---|---|
| To: | David Rowley <dgrowleyml(at)gmail(dot)com> | 
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, rushabh(dot)lathia(at)gmail(dot)com | 
| Subject: | Re: [PATCH] Keeps tracking the uniqueness with UniqueKey | 
| Date: | 2020-05-07 01:31:51 | 
| Message-ID: | CAKU4AWr1BmbQB4F7j22G+NS4dNuem6dKaUf+1BK8me61uBgqqg@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
I just uploaded the v7 version and split it into smaller commits for easier
review/merge. I also maintain a  up-to-date README.uniquekey
document since something may changed during discussion or later code.
Here is the simple introduction of each commit.
====
1. v7-0001-Introduce-RelOptInfo-notnullattrs-attribute.patch
This commit adds the notnullattrs to RelOptInfo,  which grabs the
information
from both catalog and user's query.
2. v7-0002-Introuduce-RelOptInfo.uniquekeys-attribute.patch
This commit just add the uniquekeys to RelOptInfo and maintain it at every
stage. However the upper level code is not changed due to this.
Some changes of this part in v7:
1). Removed the UniqueKey.positions attribute. In the past it is used in
    convert_subquery_uniquekeys, however we don't need it actually (And I
    maintained it wrong in the past). Now I build the relationship between
the
    outer var to subuqery's TargetList with outrel.subquery.processed_tlist.
2). onerow UniqueKey(exprs = NIL) need to be converted to normal
uniquekey(exprs
   != NIL) if it is not one-row any more. This may happen on some outer
join.
3. v7-0003-Refactor-existing-uniqueness-related-code-to-use-.patch
Refactor the existing functions like innerrel_is_unique/res_is_distinct_for
to
use UniqueKey, and postpone the call of remove_useless_join and
reduce_unique_semijoins to use the new implementation.
4. v7-0004-Remove-distinct-node-AggNode-if-the-input-is-uniq.patch
Remove the distinct node if the result is distinct already.  Remove the
aggnode
if the group by clause is unique already AND there is no aggregation
function in
query.
5. v7-0005-If-the-group-by-clause-is-unique-and-we-have-aggr.patch
If the group by clause is unique and query has aggregation function, we use
the AGG_SORT strategy but without really sort since it has only one row in
each
group.
6. v7-0006-Join-removal-at-run-time-with-UniqueKey.patch
This commit run join removal at build_join_rel.  At that time, it can fully
uses
unique key. It can handle some more cases, I added some new test cases to
join.sql. However it can be a replacement of the current one. There are some
cases the new strategy can work run well but the current one can.  Like
SELECT a.* FROM a LEFT JOIN (b left join c on b.c_id = c.id) ON (a.b_id =
b.id);
during the join a & b, the join can't be removed since b.id is still useful
in
future. However in the future, we know the b.id can be removed as well, but
it is too late to remove the previous join.
At the implementation part,  the main idea is if the join_canbe_removed. we
will copy the pathlist from outerrel to joinrel. There are several items
need to
handle.
1. To make sure the overall join_search_one_level, we have to keep the
joinrel
   even the innerrel is removed (rather than discard the joinrel).
2. If the innerrel can be removed, we don't need to build pathlist for
joinrel,
   we just reuse the pathlist from outerrel. However there are many places
where
   use assert rel->pathlist[*]->parent == rel. so I copied the pathlist, we
   have to change the parent to joinrel.
3. During create plan for some path on RTE_RELATION, it needs to know the
   relation Oid with path->parent->relid. so we have to use the
outerrel->relid
   to overwrite the joinrel->relid which is 0 before.
4. Almost same paths as item 3, it usually assert
best_path->parent->rtekind ==
   RTE_RELATION; now the path may appeared in joinrel, so I used
   outerrel->rtekind to overwrite joinrel->rtekind.
5. I guess there are some dependencies between path->pathtarget and
   rel->reltarget. since we reuse the pathlist of outerrel, so I used the
   outer->reltarget as well. If the join can be removed, I guess the length
of
   list_length(outrel->reltarget->exprs) >= (joinrel->reltarget->exprs). we
can
   rely on the ProjectionPath to reduce the tlist.
My patches is based on the current latest commit fb544735f1.
Best Regards
Andy Fan
>
| Attachment | Content-Type | Size | 
|---|---|---|
| v7-0005-If-the-group-by-clause-is-unique-and-we-have-aggr.patch | application/octet-stream | 13.2 KB | 
| v7-0001-Introduce-RelOptInfo-notnullattrs-attribute.patch | application/octet-stream | 2.9 KB | 
| v7-0004-Remove-distinct-node-AggNode-if-the-input-is-uniq.patch | application/octet-stream | 23.4 KB | 
| v7-0003-Refactor-existing-uniqueness-related-code-to-use-.patch | application/octet-stream | 19.2 KB | 
| v7-0002-Introuduce-RelOptInfo.uniquekeys-attribute.patch | application/octet-stream | 48.2 KB | 
| v7-0006-Join-removal-at-run-time-with-UniqueKey.patch | application/octet-stream | 13.1 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Kyotaro Horiguchi | 2020-05-07 02:21:09 | Re: +(pg_lsn, int8) and -(pg_lsn, int8) operators | 
| Previous Message | دار الآثار للنشر والتوزيع-صنعاء Dar Alathar-Yemen | 2020-05-07 00:59:28 | رد: BUG #16419: wrong parsing BC year in to_date() function |