Re: Tracking notnull attributes inside Var

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Subject: Re: Tracking notnull attributes inside Var
Date: 2022-05-20 05:18:32
Message-ID: CAKU4AWr=0rm=GoALn3dSwYT7zKFE0VHSt-sAnL1ECCPBV16MQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Ashutosh:

Nice to see you again!

On Tue, May 17, 2022 at 8:50 PM Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
wrote:

> On Sun, May 15, 2022 at 8:41 AM Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com> wrote:
>
> >
> > The var in RelOptInfo->reltarget should have nullable = 0 but the var in
> > RelOptInfo->baserestrictinfo should have nullable = 1; The beauty of
> this
> > are: a). It can distinguish the two situations perfectly b). Whenever we
> want
> > to know the nullable attribute of a Var for an expression, it is super
> easy to
> > know. In summary, we need to maintain the nullable attribute at 2
> different
> > places. one is the before the filters are executed(baserestrictinfo,
> joininfo,
> > ec_list at least). one is after the filters are executed
> (RelOptInfo.reltarget
> > only?)
>
> Thanks for identifying this. What you have written makes sense and it
> might open a few optimization opportunities. But let me put down some
> other thoughts here. You might want to take those into consideration
> when designing your solution.
>

Thanks.

>
> Do we want to just track nullable and non-nullable. May be we want
> expand this class to nullable (var may be null), non-nullable (Var is
> definitely non-NULL), null (Var will be always NULL).
>
>
Currently it doesn't support "Var will be always NULL" . Do you have any
use cases for this? and I can't think of too many cases where we can get
such information except something like "SELECT a FROM t WHERE a
IS NULL".

But the other way to look at this is along the lines of equivalence
> classes. Equivalence classes record the expressions which are equal in
> the final result of the query. The equivalence class members are not
> equal at all the stages of query execution. But because they are
> equal in the final result, we can impose that restriction on the lower
> levels as well. Can we think of nullable in that fashion? If a Var is
> non-nullable in the final result, we can impose that restriction on
> the intermediate stages since rows with NULL values for that Var will
> be filtered out somewhere. Similarly we could argue for null Var. But
> knowledge that a Var is nullable in the final result does not impose a
> NULL, non-NULL restriction on the intermediate stages. If we follow
> this thought process, we don't need to differentiate Var at different
> stages in query.
>

I agree this is an option. If so we need to track it under the PlannerInfo
struct but it would not be as fine-grained as my previous. Without
intermediate information, We can't know if a UnqiueKey contains multiple
NULLs, this would not be an issue for the "MARK Distinct as no-op" case,
but I'm not sure it is OK for other UniqueKey user cases. So my current
idea
is I still prefer to maintain the intermediate information, unless we are
sure it
costs too much or it is too complex to implement which I don't think so for
now
at least. So if you have time to look at the attached patch, that would be
super
great as well.

--
Best Regards
Andy Fan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-05-20 05:25:10 Re: 15beta1 test failure on mips in isolation/expected/stats
Previous Message Andy Fan 2022-05-20 04:42:41 Re: Tracking notnull attributes inside Var