Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
Date: 2019-02-27 02:13:23
Message-ID: CAAaqYe_QPVZMEUb_vTP3zeNLrLRZZf2pdM=181HisvMpwoVKqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 18, 2019 at 4:53 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> So ... this is actively wrong.
>
> This case shows that you can't ignore the empty-array possibility
> for a ScalarArrayOpExpr with useOr = false, because
> "SELECT null::int = all(array[]::int[])" yields TRUE:
>
> contrib_regression=# select * from test_predtest($$
> select x is not null, x = all(array(select i from generate_series(1,0) i))
> from integers
> $$);
> WARNING: strong_implied_by result is incorrect
> -[ RECORD 1 ]-----+--
> strong_implied_by | t
> weak_implied_by | f
> strong_refuted_by | f
> weak_refuted_by | f
> s_i_holds | f
> w_i_holds | f
> s_r_holds | f
> w_r_holds | f
>
> This case shows that you can't ignore the distinction between null
> and false results once you've descended through strict function(s):
>
> contrib_regression=# select * from test_predtest($$
> select x is not null, strictf(true, x = any(array(select i from generate_series(1,0) i)))
> from integers
> $$);
> WARNING: strong_implied_by result is incorrect
> -[ RECORD 1 ]-----+--
> strong_implied_by | t
> weak_implied_by | f
> strong_refuted_by | f
> weak_refuted_by | f
> s_i_holds | f
> w_i_holds | f
> s_r_holds | f
> w_r_holds | f
>
> (In this usage, the strictf() function from the test_predtest.sql
> will simply return the NOT of its second input; so it gives different
> answers depending on whether the SAOP returned false or null.)
>
> I think you could probably repair the second problem by adding
> an additional argument to clause_is_strict_for() indicating whether
> it has to prove the clause to be NULL or merely non-TRUE; when
> recursing you'd always have to ask for the former.

I've implemented this fix as suggested. The added argument I've
initially called "allow_false"; I don't love the name, but it is
directly what it does. I'd appreciate other suggestions (if you prefer
it change).

> The first problem could be resolved by insisting on being able to
> prove the array non-empty when !useOr, but I'm not sure if it's
> really worth the trouble, as opposed to just not trying to prove
> anything for !useOr cases. I'm not sure that "x op ALL(ARRAY)"
> occurs often enough in the wild to justify expending code on.
>
> The other case where being able to prove the array nonempty might
> be worth something is when you've recursed and hence need to be
> able to prove the SAOP's result to be NULL not just not-TRUE.
> Again though, I don't know how often that occurs in practice,
> so even the combination of those two motivations might not be
> worth having code to check the array argument.

I've implemented this also; the thing that puts it over the edge for
me on the question of "is it worth it in the wild" is cases like "x
!op ALL(ARRAY)", since it seems plausible to me that that's (unlike
the non-negated case) a reasonably common operation. At the very least
it seems likely to be a more logically interesting operation, which
I'm taking as a proxy for common, I suppose.

> It'd also be worth spending some brain cells on what happens
> when the SAOP's array argument is null overall, which causes
> its result to be null (not false). Maybe the logic goes
> through without needing any explicit test for that case
> (and indeed I couldn't break it in testing that), but it'd
> likely be worth a comment.

Since all SAOPs (both ALL and ANY) return NULL when the array is NULL
regardless of the LHS argument, then this case qualifies as strict.
I've included this in the above fix since both NULL constant arrays
and non-empty arrays can both be proven strict with strict operators.

Ideally I think this case would be handled elsewhere in the optimizer
by directly replacing the saop and a constant NULL array with NULL,
but this isn't the patch to do that, and at any rate I'd have to do a
bit more investigation to understand how to do such (if it's easily
possible).

> I don't especially care for the proposed function name
> "clause_proves_for_null_test". The only thing that brings to my
> mind is the question "proves WHAT, exactly?" --- and as these
> examples show, being crystal clear on what it proves is essential.

Given the new argument, I've reverted the name change.

I also updated the tests with a new helper function "opaque_array" to
make it easy to test cases where the planner can't inspect the array.
That way we don't need to rely on CTEs as an optimization fence --
thus improving both maintainability and readability. I also noticed
there were also some tests where the array was already opaque yet I
was still using CTEs as an optimization fence; I've cleaned those up.

I've updated the comments significantly to reflect the above changes.

I've attached an updated (and rebased) patch.

James Coleman

Attachment Content-Type Size
saop_is_not_null-v10.patch text/x-patch 36.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-02-27 02:21:58 Re: Problem with default partition pruning
Previous Message Haribabu Kommi 2019-02-27 02:06:51 Re: Pluggable Storage - Andres's take