Proving IS NOT NULL inference for ScalarArrayOpExpr's

From: James Coleman <jtc331(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Proving IS NOT NULL inference for ScalarArrayOpExpr's
Date: 2018-11-10 21:33:26
Message-ID: CAAaqYe8yKSvzbyu8w-dThRs9aTFMwrFxn_BkTYeXgjqe3CbNjg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've recently been investigating improving our plans for queries like:
SELECT * FROM t WHERE t.foo IN (1, 2..1000);
where the table "t" has a partial index on "foo" where "foo IS NOT NULL".

Currently the planner generates an index [only] scan so long as the number
of
items in the IN expression is <= 100, but as soon as you add the 101st item
it reverts to seq scan. If we add the explicit null check like:
SELECT * FROM t WHERE t.foo IN (1, 2..1000) AND foo IS NOT NULL;
then we go back to the desired index scan.

This occurs because predtest.c stops expanding ScalarArrayOpExpr's with
constant array arguments into OR trees when the array size is > 100. The
rest
of the predicate proving code then becomes unable to infer that foo is not
null
and therefore the planner cannot prove that the partial index is correct to
use.

(Please pardon technical details in the below background that may be way
off;
I don't have a lot of experience with the Postgres codebase yet, and am
still
trying to build a mental model of things.)

At first I was imagining having the parse keep track of whether an array
const
expr contained any nulls and perhaps adding generated quals (in an
equivalence
class?) to allow the planner to easily prove the index was correct. I'd been
going down this track because in my mind the issue was because the planner
needed to verify whether all of the array elements were not null.

But as I started to dig into the predtest.c NOT NULL proofs and add test
cases,
I realized that at least in many normal op cases we can safely infer that
foo
is not null when "foo <op> <array>" is true even if the array contains null
elements.

This is such a simple change that it seems like I must be missing a case
where
the above doesn't hold true, but I can't immediately think of any, and
indeed
with the attached patch all existing tests pass (including some additional
ones I added for predtest to play around with it).

Am I missing something obvious? Is this a valid approach?

Other outstanding questions:

Should I add additional tests for predtest? It already seems to cover some
null
test cases with scalar array ops, but I'd be happy to add more if desired.

Should I add a test case for the resulting plan with "foo IN (...)" with an
array with more than 100 elements?

Thanks,
James Coleman

Attachment Content-Type Size
saop_is_not_null-v1.patch application/octet-stream 548 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-11-10 22:29:54 Re: Skylake-S warning
Previous Message Andres Freund 2018-11-10 19:37:03 Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA