Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
Date: 2019-01-14 16:08:27
Message-ID: 25688.1547482107@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

James Coleman <jtc331(at)gmail(dot)com> writes:
> On Sun, Jan 13, 2019 at 3:06 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> There's still a logical problem here, which is that in order to
>> prove that the ScalarArrayOpExpr yields NULL when its LHS does,
>> you also have to prove that the RHS is not an empty array.

> I've constructed a test case running the test_predtest function on the
> expression
> "select x is null, x = any(array[]::int[]) from integers" which I
> think gets at the logic bug you're noting. In doing this I noticed
> something I don't fully understand: on master this shows that "x =
> any(array[]::int[])" proves "x is null" (while my patch shows the
> opposite). Is this because the second expression being false means
> there's can be no valid value for x? It also claims it refutes the
> same, which is even more confusing.

That sounds like a bug, but I think it's actually OK because a vacuous
OR is necessarily false, and a false predicate "implies" any conclusion
at all according to the customary definition of implication. If there
are any real optimization bugs in this area, it's probably a result of
calling code believing more than it should about the meaning of a proof.

I think these test cases don't actually prove much about the behavior
of your patch. Wouldn't they be handled by the generic OR-conversion
logic, since there's fewer than MAX_SAOP_ARRAY_SIZE items?

> It seems to me that this would mean that an IS NOT NULL index would
> still not be proven to be usable for a non-constant array (e.g., from
> a subquery calculated array), and in fact I've included a (failing)
> test demonstrating that fact in the attached patch. This feeds into my
> question above about there possibly being another corollary/proof that
> could be added for this case.

Hm. That would be annoying, but on reflection I think maybe we don't
actually need to worry about emptiness of the array after all. The
thing that we need to prove, at least for the implication case, is
"truth of the ScalarArrayOp implies truth of the IS NOT NULL clause".
Per above, if the ScalarArrayOp is necessarily false, then we can
claim that the implication holds. However, we need to work through
all four proof rules (strong vs. weak, implication vs refutation)
and see which ones this applies to --- I'm not sure offhand that
the logic works for all four. (ENOCAFFEINE...) In any case it
requires thinking a bit differently about what clause_is_strict_for()
is really doing.

> Semi-related: I noticed that predicate_classify handles an ArrayExpr
> by using list_length; this seems unnecessary if we know we can fail
> fast. I realize it's not a common problem, but I have seen extremely
> long arrays before, and maybe we should fall out of the count once we
> hit max+1 items.

Huh? list_length() is constant-time.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Verite 2019-01-14 16:21:34 Re: insensitive collations
Previous Message Tomas Vondra 2019-01-14 15:45:45 Re: MERGE SQL statement for PG12