Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Floris Van Nee <florisvannee(at)optiver(dot)com>
Subject: Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)
Date: 2021-03-05 02:22:45
Message-ID: CAKU4AWr+4C0EJRMwWe8BBjagKFN61-HcOf6ACum0tsMzX28MGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 5, 2021 at 12:00 AM Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:

> > On Thu, Feb 18, 2021 at 08:58:13PM +0800, Andy Fan wrote:
>
> Thanks for continuing work on this patch!
>
> > On Tue, Feb 16, 2021 at 12:01 PM David Rowley <dgrowleyml(at)gmail(dot)com>
> wrote:
> >
> > > On Fri, 12 Feb 2021 at 15:18, Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
> wrote:
> > > >
> > > > On Fri, Feb 12, 2021 at 9:02 AM David Rowley <dgrowleyml(at)gmail(dot)com>
> > > wrote:
> > > >> The reason I don't really like this is that it really depends where
> > > >> you want to use RelOptInfo.notnullattrs. If someone wants to use it
> > > >> to optimise something before the base quals are evaluated then they
> > > >> might be unhappy that they found some NULLs.
> > > >>
> > > >
> > > > Do you mean the notnullattrs is not set correctly before the base
> quals
> > > are
> > > > evaluated? I think we have lots of data structures which are set
> just
> > > after some
> > > > stage. but notnullattrs is special because it is set at more than 1
> > > stage. However
> > > > I'm doubtful it is unacceptable, Some fields ever change their
> meaning
> > > at different
> > > > stages like Var->varno. If a user has a misunderstanding on it, it
> > > probably will find it
> > > > at the testing stage.
> > >
> > > You're maybe focusing too much on your use case for notnullattrs. It
> > > only cares about NULLs in the result for each query level.
> > >
> > > .... thinks of an example...
> > >
> > > OK, let's say I decided that COUNT(*) is faster than COUNT(id) so
> > > decided that I might like to write a patch which rewrite the query to
> > > use COUNT(*) when it was certain that "id" could not contain NULLs.
> > >
> > > The query is:
> > >
> > > SELECT p.partid, p.partdesc,COUNT(s.saleid) FROM part p LEFT OUTER
> > > JOIN sales s ON p.partid = s.partid GROUP BY p.partid;
> > >
> > > sale.saleid is marked as NOT NULL in pg_attribute. As the writer of
> > > the patch, I checked the comment for notnullattrs and it says "Not
> > > null attrs, start from -FirstLowInvalidHeapAttributeNumber", so I
> > > should be ok to assume since sales.saleid is marked in notnullattrs
> > > that I can rewrite the query?!
> > >
> > > The documentation about the RelOptInfo.notnullattrs needs to be clear
> > > what exactly it means. I'm not saying your representation of how to
> > > record NOT NULL in incorrect. I'm saying that you need to be clear
> > > what exactly is being recorded in that field.
> > >
> > > If you want it to mean "attribute marked here cannot output NULL
> > > values at this query level", then you should say something along those
> > > lines.
> > >
> > > However, having said that, because this is a Bitmapset of
> > > pg_attribute.attnums, it's only possible to record Vars from base
> > > relations. It does not seem like you have any means to record
> > > attributes that are normally NULLable, but cannot produce NULL values
> > > due to a strict join qual.
> > >
> > > e.g: SELECT t.nullable FROM t INNER JOIN j ON t.nullable = j.something;
> > >
> > > I'd expect the RelOptInfo for t not to contain a bit for the
> > > "nullable" column, but there's no way to record the fact that the join
> > > RelOptInfo for {t,j} cannot produce a NULL for that column. It might
> > > be quite useful to know that for the UniqueKeys patch.
> > >
> >
> > I checked again and found I do miss the check on JoinExpr->quals. I have
> > fixed it in v3 patch. Thanks for the review!
> >
> > In the attached v3, commit 1 is the real patch, and commit 2 is just add
> > some logs to help local testing. notnull.sql/notnull.out is the test
> case
> > for
> > this patch, both commit 2 and notnull.* are not intended to be committed
> > at last.
>
> Just to clarify, this version of notnullattrs here is the latest one,
> and another one from "UniqueKey on Partitioned table" thread should be
> disregarded?
>

Actually they are different sections for UniqueKey. Since I don't want to
mess
two topics in one place, I open another thread. The topic here is how to
represent
a not null attribute, which is a precondition for all UniqueKey stuff. The
thread
" UniqueKey on Partitioned table[1] " is talking about how to maintain the
UniqueKey on a partitioned table only.

>
> > Besides the above fix in v3, I changed the comments alongs the
> notnullattrs
> > as below and added a true positive helper function is_var_nullable.
>
> With "true positive" you mean it will always correctly say if a Var is
> nullable or not?

not null. If I say it is not null (return value is false), it is not null
for sure. If
it is nullable (true), it may be still not null for some stages. But I
don't want
to distinguish them too much, so I just say it is nullable.

> I'm not sure about this, but couldn't be there still
> some cases when a Var belongs to nullable_baserels, but still has some
> constraints preventing it from being nullable (e.g. a silly example when
> the not nullable column belong to the table, and the query does full
> join of this table on itself using this column)?
>
> Do you say something like "SELECT * FROM t1 left join t2 on t1.a = t2.a
WHERE
t2.b = 3; "? In this case, the outer join will be reduced to inner join
at
reduce_outer_join stage, which means t2 will not be shown in
nullable_baserels.

> Is this function necessary for the following patches? I've got an
> impression that the discussion in this thread was mostly evolving about
> correct description when notnullattrs could be used, not making it
> bullet proof.
>

Exactly, that is the blocker issue right now. I hope more authorities can
give
some suggestions to move on.

> > Bitmapset *notnullattrs;
>
> It looks like RelOptInfo has its own out function _outRelOptInfo,
> probably the notnullattrs should be also present there as BITMAPSET_FIELD?
>
>
Yes, it should be added.

> As a side note, I've attached those two new threads to CF item [1],
> hopefully it's correct.
>
> [1]: https://commitfest.postgresql.org/32/2433/
>

Thanks for doing that. It is correct.

[1]
https://www.postgresql.org/message-id/CAKU4AWrU35c9g3cE15JmVwh6B2Hzf4hf7cZUkRsiktv7AKR3Ag@mail.gmail.com

--
Best Regards
Andy Fan (https://www.aliyun.com/)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message miyake_kouta 2021-03-05 02:26:45 Re: [PATCH] pgbench: Bug fix for the -d option
Previous Message Tom Lane 2021-03-05 01:58:39 Re: [PATCH] remove deprecated v8.2 containment operators