Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Zhihong Yu <zyu(at)yugabyte(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Floris Van Nee <florisvannee(at)optiver(dot)com>
Subject: Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)
Date: 2021-08-16 02:24:24
Message-ID: CAKU4AWo5ySsRQ1bJX9jsYN79zJY3Hp3p=FBfsn1X64oOW=SN=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Zhihong,

On Mon, Aug 16, 2021 at 12:35 AM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
>
>
>
> On Sun, Aug 15, 2021 at 7:33 AM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
>>
>> Hi:
>>
>> I have finished the parts for baserel, joinrel, subquery, distinctrel. I think
>> the hardest ones have been verified. Here is the design overview.
>>
>> 1. Use EC instead of expr to cover more UniqueKey cases.
>> 2. Redesign the UniqueKey as below:
>>
>> @@ -246,6 +246,7 @@ struct PlannerInfo
>>
>> List *eq_classes; /* list of active EquivalenceClasses */
>> + List *unique_exprs; /* List of unique expr */
>>
>> bool ec_merging_done; /* set true once ECs are canonical */
>>
>> +typedef struct UniqueKey
>> +{
>> + NodeTag type;
>> + Bitmapset *unique_expr_indexes;
>> + bool multi_nulls;
>> +} UniqueKey;
>> +
>>
>> PlannerInfo.unique_exprs is a List of unique exprs. Unique Exprs is a set of
>> EquivalenceClass. for example:
>>
>> CREATE TABLE T1(A INT NOT NULL, B INT NOT NULL, C INT, pk INT primary key);
>> CREATE UNIQUE INDEX ON t1(a, b);
>>
>> SELECT DISTINCT * FROM T1 WHERE a = c;
>>
>> Then we would have PlannerInfo.unique_exprs as below
>> [
>> [EC(a, c), EC(b)],
>> [EC(pk)]
>> ]
>>
>> RelOptInfo(t1) would have 2 UniqueKeys.
>> UniqueKey1 {unique_expr_indexes=bms{0}, multinull=false]
>> UniqueKey2 {unique_expr_indexes=bms{1}, multinull=false]
>>
>> The design will benefit many table joins cases. For instance a 10- tables join,
>> each table has a primary key (a, b). Then we would have a UniqueKey like
>> this.
>>
>> JoinRel{1,2,3,4} - {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b}
>> JoinRel{1,2,3,4,5} - {t1.a, t1.b, t2.a, t2.b, t3.a, t3.b t4.a t4.b t5.a t5.b}
>>
>> When more tables are joined, the list would be longer and longer, build the list
>> consumes both CPU cycles and memory.
>>
>> With the method as above, we can store it as:
>>
>> root->unique_exprs = /* All the UniqueKey is stored once */
>> [
>> [t1.a, t1.b], -- EC is ignored in document.
>> [t2.a, t2.b],
>> [t3.a, t3.b],
>> [t4.a, t4.b],
>> [t5.a, t5.b],
>> [t6.a, t6.b],
>> [t7.a, t7.b],
>> [t8.a, t8.b],
>> [t9.a, t9.b],
>> [t10.a, t10.b],
>> ]
>>
>> JoinRel{1,2,3,4} - Bitmapset{0,1,2,3} -- one bitmapword.
>> JoinRel{1,2,3,4,5} - Bitmapset{0,1,2,3,4} -- one bitmapword.
>>
>> The member in the bitmap is the index of root->unique_exprs rather than
>> root->eq_classes because we need to store the SingleRow case in
>> root->unqiue_exprs as well.
>>
>> 3. Define a new SingleRow node and use it in joinrel as well.
>>
>> +typedef struct SingleRow
>> +{
>> + NodeTag type;
>> + Index relid;
>> +} SingleRow;
>>
>> SELECT * FROM t1, t2 WHERE t2.pk = 1;
>>
>> root->unique_exprs
>> [
>> [t1.a, t1.b],
>> SingleRow{relid=2}
>> ]
>>
>> JoinRel{t1} - Bitmapset{0}
>> JoinRel{t2} - Bitmapset{1}
>>
>> JoinRelt{1, 2} Bitmapset{0, 1} -- SingleRow will never be expanded to dedicated
>> exprs.
>>
>> 4. Interesting UniqueKey to remove the Useless UniqueKey as soon as possible.
>>
>> The overall idea is similar with PathKey, I distinguish the UniqueKey for
>> distinct purpose and useful_for_merging purpose.
>>
>> SELECT DISTINCT pk FROM t; -- OK, maintain it all the time, just like
>> root->query_pathkey.
>>
>> SELECT DISTINCT t2.c FROM t1, t2 WHERE t1.d = t2.pk; -- T2's UniqueKey PK is
>> use before t1 join t2, but not useful after it.
>>
>> Currently the known issue I didn't pass the "interesting UniqueKey" info to
>> subquery well [2], I'd like to talk more about this when we discuss about
>> UnqiueKey on subquery part.
>>
>>
>> 5. relation_is_distinct_for
>>
>> Now I design the function as
>>
>> + bool
>> + relation_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, List
>> *distinct_pathkey)
>>
>> It is "List *distinct_pathkey", rather than "List *exprs". The reason pathkey
>> has EC in it, and all the UniqueKey has EC as well. if so, the subset-like
>> checking is very effective. As for the distinct/group as no-op case, we have
>> pathkey all the time. The only drawback of it is some clauses are not-sortable,
>> in this case, the root->distinct_pathkey and root->group_pathkey is not
>> set. However it should be rare by practice, so I ignore this part for
>> now. After all, I can have a relation_is_disticnt_for version for Exprs. I just
>> not implemented it so far.
>>
>> 6. EC overhead in UnqiueKey & UNION case.
>>
>> Until now I didn't create any new EC for the UniqueKey case, I just used the
>> existing ones. However I can't handle the case like
>>
>> SELECT a, b FROM t1
>> UNION
>> SELECT x, y FROM t2;
>>
>> In this case, there is no EC created with existing code. and I don't want to
>> create them for the UniqueKey case as well. so my plan is just not to handle
>> the case for UNION.
>>
>> Since we need some big effort from the reviewer, I split the patch into many
>> smaller chunks.
>>
>> Patch 1 / Patch 2: I just split some code which UniqueKey uses but not very
>> interrelated. Splitting them out to reduce the core patch size.
>> Patch 3: still the notnull stuff. This one doesn't play a big role overall,
>> even if the design is changed at last, we can just modify some small stuff. IMO,
>> I don't think it is a blocker issue to move on.
>> Patch 4: Support the UniqueKey for baserel.
>> Patch 5: Support the UniqueKey for joinrel.
>> Patch 6: Support the UniqueKey for some upper relation, like distinctrel,
>> groupby rel.
>>
>> I'd suggest moving on like this:
>> 1. Have an overall review to see if any outstanding issues the patch has.
>> 2. If not, we can review and commit patch 1 & patch 2 to reduce the patch size.
>> 3. Decide which method to take for not null stuff. IMO Tom's method
>> would be a bit
>> complex and the benefit is not very clear to me[1]. So the choices
>> include: a). move on UniqueKey stuff until Tom's method is ready. b). Move
>> on the UniqueKey with my notnull way, and changes to Tom's method when
>> necessary. I prefer method b).
>> 4. Review & Commit the UniqueKey for BaseRel part.
>> 5. Review & Commit the UniqueKey for JoinRel part.
>> 6. Review & Commit the UniqueKey for SubQuery part *without* the Interesting
>> UniqueKey well handled.
>> 7. Review & Commit the UniqueKey for SubQuery part *with* the Interesting
>> UniqueKey well handled.
>> 8. Discuss about the UniqueKey on partitioned rel case and develop/review/commit
>> it.
>> 9. Apply UniqueKey stuff on more user cases rather than DISTINCT.
>>
>> What do you think about this?
>>
>> [1] https://www.postgresql.org/message-id/CAApHDvo5b2pYX%2BdbPy%2BysGBSarezRSfXthX32TZNFm0wPdfKGQ%40mail.gmail.com
>> [2] https://www.postgresql.org/message-id/CAKU4AWo6-%3D9mg3UQ5UJhGCMw6wyTPyPGgV5oh6dFvwEN%3D%2Bhb_w%40mail.gmail.com
>>
>>
>> Thanks
>
> Hi,
> For v3-0005-Support-UniqueKey-on-JoinRel.patch :
>
> +static void populate_joinrel_composited_uniquekey(PlannerInfo *root,
>
> populate_joinrel_composited_uniquekey -> populate_joinrel_composite_uniquekey (without the trailing d for composite)
>
> For populate_joinrel_uniquekeys():
>
> + foreach(lc, outerrel->uniquekeys)
> + {
> ...
> + return;
>
> Should the remaining unique keys be considered ?
>
> For populate_joinrel_uniquekey_for_rel():
>
> + else if (bms_equal(r->right_relids, rel->relids) && r->left_ec != NULL)
> + {
> + other_ecs = lappend(other_ecs, r->right_ec);
> + other_relids = bms_add_members(other_relids, r->left_relids);
>
> It seems the append to other_ecs is the same as the one for the `bms_equal(r->left_relids, rel->relids) && r->right_ec != NULL` case. Is this correct ?
>

Correct, both of them are bugs, I will fix them in the next version,
including the "tailing d".
Thanks for your review!

--
Best Regards
Andy Fan (https://www.aliyun.com/)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-08-16 02:44:36 Re: badly calculated width of emoji in psql
Previous Message Justin Pryzby 2021-08-16 01:32:55 Re: PoC/WIP: Extended statistics on expressions