Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, rushabh(dot)lathia(at)gmail(dot)com
Subject: Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Date: 2020-05-07 01:31:51
Message-ID: CAKU4AWr1BmbQB4F7j22G+NS4dNuem6dKaUf+1BK8me61uBgqqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I just uploaded the v7 version and split it into smaller commits for easier
review/merge. I also maintain a up-to-date README.uniquekey
document since something may changed during discussion or later code.

Here is the simple introduction of each commit.

====
1. v7-0001-Introduce-RelOptInfo-notnullattrs-attribute.patch

This commit adds the notnullattrs to RelOptInfo, which grabs the
information
from both catalog and user's query.

2. v7-0002-Introuduce-RelOptInfo.uniquekeys-attribute.patch

This commit just add the uniquekeys to RelOptInfo and maintain it at every
stage. However the upper level code is not changed due to this.

Some changes of this part in v7:
1). Removed the UniqueKey.positions attribute. In the past it is used in
convert_subquery_uniquekeys, however we don't need it actually (And I
maintained it wrong in the past). Now I build the relationship between
the
outer var to subuqery's TargetList with outrel.subquery.processed_tlist.
2). onerow UniqueKey(exprs = NIL) need to be converted to normal
uniquekey(exprs
!= NIL) if it is not one-row any more. This may happen on some outer
join.

3. v7-0003-Refactor-existing-uniqueness-related-code-to-use-.patch

Refactor the existing functions like innerrel_is_unique/res_is_distinct_for
to
use UniqueKey, and postpone the call of remove_useless_join and
reduce_unique_semijoins to use the new implementation.

4. v7-0004-Remove-distinct-node-AggNode-if-the-input-is-uniq.patch

Remove the distinct node if the result is distinct already. Remove the
aggnode
if the group by clause is unique already AND there is no aggregation
function in
query.

5. v7-0005-If-the-group-by-clause-is-unique-and-we-have-aggr.patch

If the group by clause is unique and query has aggregation function, we use
the AGG_SORT strategy but without really sort since it has only one row in
each
group.

6. v7-0006-Join-removal-at-run-time-with-UniqueKey.patch

This commit run join removal at build_join_rel. At that time, it can fully
uses
unique key. It can handle some more cases, I added some new test cases to
join.sql. However it can be a replacement of the current one. There are some
cases the new strategy can work run well but the current one can. Like

SELECT a.* FROM a LEFT JOIN (b left join c on b.c_id = c.id) ON (a.b_id =
b.id);

during the join a & b, the join can't be removed since b.id is still useful
in
future. However in the future, we know the b.id can be removed as well, but
it is too late to remove the previous join.

At the implementation part, the main idea is if the join_canbe_removed. we
will copy the pathlist from outerrel to joinrel. There are several items
need to
handle.

1. To make sure the overall join_search_one_level, we have to keep the
joinrel
even the innerrel is removed (rather than discard the joinrel).
2. If the innerrel can be removed, we don't need to build pathlist for
joinrel,
we just reuse the pathlist from outerrel. However there are many places
where
use assert rel->pathlist[*]->parent == rel. so I copied the pathlist, we
have to change the parent to joinrel.
3. During create plan for some path on RTE_RELATION, it needs to know the
relation Oid with path->parent->relid. so we have to use the
outerrel->relid
to overwrite the joinrel->relid which is 0 before.
4. Almost same paths as item 3, it usually assert
best_path->parent->rtekind ==
RTE_RELATION; now the path may appeared in joinrel, so I used
outerrel->rtekind to overwrite joinrel->rtekind.
5. I guess there are some dependencies between path->pathtarget and
rel->reltarget. since we reuse the pathlist of outerrel, so I used the
outer->reltarget as well. If the join can be removed, I guess the length
of
list_length(outrel->reltarget->exprs) >= (joinrel->reltarget->exprs). we
can
rely on the ProjectionPath to reduce the tlist.

My patches is based on the current latest commit fb544735f1.

Best Regards
Andy Fan

>

Attachment Content-Type Size
v7-0005-If-the-group-by-clause-is-unique-and-we-have-aggr.patch application/octet-stream 13.2 KB
v7-0001-Introduce-RelOptInfo-notnullattrs-attribute.patch application/octet-stream 2.9 KB
v7-0004-Remove-distinct-node-AggNode-if-the-input-is-uniq.patch application/octet-stream 23.4 KB
v7-0003-Refactor-existing-uniqueness-related-code-to-use-.patch application/octet-stream 19.2 KB
v7-0002-Introuduce-RelOptInfo.uniquekeys-attribute.patch application/octet-stream 48.2 KB
v7-0006-Join-removal-at-run-time-with-UniqueKey.patch application/octet-stream 13.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-05-07 02:21:09 Re: +(pg_lsn, int8) and -(pg_lsn, int8) operators
Previous Message دار الآثار للنشر والتوزيع-صنعاء Dar Alathar-Yemen 2020-05-07 00:59:28 رد: BUG #16419: wrong parsing BC year in to_date() function