Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Floris Van Nee <florisvannee(at)optiver(dot)com>
Subject: Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)
Date: 2021-07-07 09:00:05
Message-ID: CAKU4AWqyQ9JfGfQzH_2WKG-k=FeAHUqAPt8E-=E2wtyVB96OLQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 6, 2021 at 5:34 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Sun, 4 Jul 2021 at 02:08, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
> > I'd start to work on UniqueKey again, it would be great that we can target it
> > to PG 15. The attached patch is just for the notnull_attrs. Since we can't say
> > a column is nullable or not without saying in which resultset, So I think attaching
> > it to RelOptInfo is unavoidable. Here is how my patch works.
>
> I'd also like to see this work progress for PG15.

Thank you David!

I am re-designing/implementing the UniqueKey, but it is better to have
a design review as soon as possible. This writing is for that. To make the
review easier, I also uploaded my in-completed patch (Correct, runable
with testcase).

Main changes are:
1. Use EC instead of expr, to cover more UniqueKey case.
2. Redesign the UniqueKey as below:

@@ -246,6 +246,7 @@ struct PlannerInfo
* subquery outputs */

List *eq_classes; /* list of active EquivalenceClasses */
+ List *unique_exprs; /* List of unique expr */

bool ec_merging_done; /* set true once ECs are canonical */

+typedef struct UniqueKey
+{
+ NodeTag type;
+ Bitmapset *unique_expr_indexes;
+ bool multi_nulls;
+} UniqueKey;
+

PlannerInfo.unique_exprs is a List of unique exprs. Unique Exprs is a set of
EquivalenceClass. for example:

CREATE TABLE T1(A INT NOT NULL, B INT NOT NULL, C INT, pk INT primary key);
CREATE UNIQUE INDEX ON t1(a, b);

SELECT DISTINCT * FROM T1 WHERE a = c;

Then we would have PlannerInfo.unique_exprs as below
[
[EC(a, c), EC(b)],
[EC(pk)]
]

RelOptInfo(t1) would have 2 UniqueKeys.
UniqueKey1 {unique_expr_indexes=bms{0}, multinull=false]
UniqueKey2 {unique_expr_indexes=bms{1}, multinull=false]

The design will benefit many table joins cases. For example, 10 tables
join. Each table has a primary key (a, b). Then we would have a UniqueKey like
this.

JoinRel{1,2,3,4} - {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b}
JoinRel{1,2,3,4, 5} - {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b t5.a t5.b}

This would be memory consuming and building such UniqueKey is CPU consuming as
well. With the new design, we can store it as

PlannerInfo.unique_exprs =
[
[t1.a, t1.b], -- EC is ignored in document.
[t2.a, t2.b],
[t3.a, t3.b],
[t4.a, t4.b],
[t5.a, t5.b],
[t6.a, t6.b],
[t7.a, t7.b],
[t8.a, t8.b],
[t9.a, t9.b],
[t10.a, t10.b],
]

JoinRel{1,2,3,4} - Bitmapset{0,1,2,3} -- one bitmapword.
JoinRel{1,2,3,4,5} - Bitmapset{0,1,2,3,4} -- one bitmapword.

3. Define a new SingleRow node and use it in joinrel as well.

+typedef struct SingleRow
+{
+ NodeTag type;
+ Index relid;
+} SingleRow;

SELECT * FROM t1, t2 WHERE t2.pk = 1;

PlannerInfo.unique_exprs
[
[t1.a, t1.b],
SingleRow{relid=2}
]

JoinRel{t1} - Bitmapset{0}
JoinRel{t2} - Bitmapset{1}
JoinRelt{1, 2} Bitmapset{0, 1} -- SingleRow will never be expanded to dedicated
exprs.

4. Cut the useless UniqueKey totally on the baserel stage based on
root->distinct_pathkey. If we want to use it anywhere else, I think this
design is OK as well. for example: group by UniqueKey.

5. Implemented the relation_is_distinct_for(root, rel, distinct_pathkey)
effectively. Here I used distinct_pathkey rather than
Query->distinctClause.

Since I implemented the EC in PlannerInfo.unique_exprs point to the
PathKey.pk_eqclass, so we can compare the address directly with '=', rather than
equal(a, b). (since qual would check the address as well, so even I use equal,
the performance is good as well). SingleRow is handled as well for this case.

You can check the more details in the attached patch. Any feedback is welcome.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

Attachment Content-Type Size
v1-0001-design-uniquekey-v2.patch application/octet-stream 18.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2021-07-07 09:09:21 Re: track_planning causing performance regression
Previous Message David Rowley 2021-07-07 08:37:42 Re: Remove useless int64 range checks on BIGINT sequence MINVALUE/MAXVALUE values