Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Floris Van Nee <florisvannee(at)optiver(dot)com>
Subject: Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)
Date: 2021-02-16 15:01:44
Message-ID: CAKU4AWrsM6YeMCXaBok4ybxcVRi1ns-V0CkLUhJ3RUoksWruoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 16, 2021 at 10:03 PM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:

>
>
> On Tue, Feb 16, 2021 at 12:01 PM David Rowley <dgrowleyml(at)gmail(dot)com>
> wrote:
>
>> On Fri, 12 Feb 2021 at 15:18, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
>> >
>> > On Fri, Feb 12, 2021 at 9:02 AM David Rowley <dgrowleyml(at)gmail(dot)com>
>> wrote:
>> >> The reason I don't really like this is that it really depends where
>> >> you want to use RelOptInfo.notnullattrs. If someone wants to use it
>> >> to optimise something before the base quals are evaluated then they
>> >> might be unhappy that they found some NULLs.
>> >>
>> >
>> > Do you mean the notnullattrs is not set correctly before the base quals
>> are
>> > evaluated? I think we have lots of data structures which are set just
>> after some
>> > stage. but notnullattrs is special because it is set at more than 1
>> stage. However
>> > I'm doubtful it is unacceptable, Some fields ever change their meaning
>> at different
>> > stages like Var->varno. If a user has a misunderstanding on it, it
>> probably will find it
>> > at the testing stage.
>>
>> You're maybe focusing too much on your use case for notnullattrs. It
>> only cares about NULLs in the result for each query level.
>>
>> .... thinks of an example...
>>
>> OK, let's say I decided that COUNT(*) is faster than COUNT(id) so
>> decided that I might like to write a patch which rewrite the query to
>> use COUNT(*) when it was certain that "id" could not contain NULLs.
>>
>> The query is:
>>
>> SELECT p.partid, p.partdesc,COUNT(s.saleid) FROM part p LEFT OUTER
>> JOIN sales s ON p.partid = s.partid GROUP BY p.partid;
>>
>> sale.saleid is marked as NOT NULL in pg_attribute. As the writer of
>> the patch, I checked the comment for notnullattrs and it says "Not
>> null attrs, start from -FirstLowInvalidHeapAttributeNumber", so I
>> should be ok to assume since sales.saleid is marked in notnullattrs
>> that I can rewrite the query?!
>>
>> The documentation about the RelOptInfo.notnullattrs needs to be clear
>> what exactly it means. I'm not saying your representation of how to
>> record NOT NULL in incorrect. I'm saying that you need to be clear
>> what exactly is being recorded in that field.
>>
>> If you want it to mean "attribute marked here cannot output NULL
>> values at this query level", then you should say something along those
>> lines.
>>
>
> I think I get what you mean. You are saying notnullattrs is only correct
> at the *current* stage, namely set_rel_size. It would not be true after
> that, but the data is still there. That would cause some confusion. I
> admit
> that is something I didn't realize before. I checked other fields of
> RelOptInfo,
> looks no one filed works like this, so I am not really happy with this
> design
> now. I'm OK with saying more things along these lines. That can be done
> we all understand each other well. Any better design is welcome as well.
> I think the "Var represents null stuff" is good, until I see your
> comments below.
>
>
>
>> However, having said that, because this is a Bitmapset of
>> pg_attribute.attnums, it's only possible to record Vars from base
>> relations. It does not seem like you have any means to record
>> attributes that are normally NULLable, but cannot produce NULL values
>> due to a strict join qual.
>>
>> e.g: SELECT t.nullable FROM t INNER JOIN j ON t.nullable = j.something;
>>
>> I'd expect the RelOptInfo for t not to contain a bit for the
>> "nullable" column, but there's no way to record the fact that the join
>> RelOptInfo for {t,j} cannot produce a NULL for that column. It might
>> be quite useful to know that for the UniqueKeys patch.
>>
>>
> The current patch can detect t.nullable is not null correctly. That
> is done by find_nonnullable_vars(qual) and deconstruct_recure stage.
>
>
>> I know there's another discussion here between Ashutosh and Tom about
>> PathTarget's and Vars. I had the Var idea too once myself [1] but it
>> was quickly shot down. Tom's reasoning there in [1] seems legit. I
>> guess we'd need some sort of planner version of Var and never confuse
>> it with the Parse version of Var. That sounds like quite a big
>> project which would have quite a lot of code churn. I'm not sure how
>> acceptable it would be to have Var represent both these things. It
>> gets complex when you do equal(var1, var2) and expect that to return
>> true when everything matches apart from the notnull field. We
>> currently have this issue with the "location" field and we even have a
>> special macro which just ignores those in equalfuncs.c. I imagine not
>> many people would like to expand that to other fields.
>>
>>
> Thanks for sharing this.
>
>
>> It would be good to agree on the correct representation for Vars that
>> cannot produce NULLs so that we don't shut the door on classes of
>> optimisation that require something other than what you need for your
>> case.
>>
>>
> Agreed. The simplest way is just adding some comments. If go a
> step further, how about just reset the notnullattrs when it is nullable
> later like outer join? I have added this logic in the attached patch.
> (comment for the notnullattrs is still not touched). I think we only
> need to handle this in build_join_rel stage.
>

..

> With the v2 commit 2,
> notnullattrs might be unset too early, but if the value is there, then
> it is correct.
>
>
This looks bad as well. How about adding an extra field in RelOptInfo for
the
outer join case. For example:

@@ -710,8 +710,14 @@ typedef struct RelOptInfo
PlannerInfo *subroot; /* if subquery */
List *subplan_params; /* if subquery */
int rel_parallel_workers; /* wanted number of
parallel workers */
- /* Not null attrs, start from -FirstLowInvalidHeapAttributeNumber */
+ /*
+ * Not null attrs, start from -FirstLowInvalidHeapAttributeNumber.
The nullness
+ * might be changed after outer join, So we need to consult with
leftouter_relids
+ * before using it.
+ */
Bitmapset *notnullattrs;
+ /* A list of Relids which will be a outer rel when join with this
relation. */
+ List *leftouter_relids;

/* Information about foreign tables and foreign joins */
Oid serverid; /* identifies
server for the table or join */

leftout_relids should be able to be filled with root->join_info_list. If
we go with this
direction, not sure leftouter_relids should be a List or not since I even
can't think
out a query which can have more than one relids for a relation.

--
Best Regards
Andy Fan (https://www.aliyun.com/)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-02-16 15:04:44 Re: POC: postgres_fdw insert batching
Previous Message Daniel Gustafsson 2021-02-16 14:51:14 Re: [Proposal] Page Compression for OLTP