Re: apply outer->inner join optimisation to OR clauses

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bradley Baetz <bbaetz(at)acm(dot)org>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: apply outer->inner join optimisation to OR clauses
Date: 2003-05-04 00:53:14
Message-ID: 20085.1052009594@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Bradley Baetz <bbaetz(at)acm(dot)org> writes:
> I don't think that the fact that |SELECT foo>2 OR NULL| gives NULL, not
> FALSE, if foo <=2 is an issue either, since they're both not a true
> value.

But that is exactly the distinction that we have to worry about when not
at top level. The error in the AND part of the proposed patch is
exhibited by
WHERE NOT ((a.a1 > 1) AND (b.b1 > 1))
When a.a1 is NULL, the AND can't yield TRUE --- but it can yield FALSE,
which will become TRUE at the top level. So neither a nor b can be
considered non-nullable in this expression.

You do have an insight here, which is that if the same rel can be shown
to null *all* the arms of an OR, it nulls the OR. The same holds true
of an AND, I think.

You might be right that passing top_level through the AND or OR would be
safe in the present state of the code, but it strikes me as fragile and
confusing to leave it that way, because it no longer means what its name
implies and the comments say. ISTM what we need to do if we want to
extend this code is to come up with a stronger definition of the context
in which each recursive level gets invoked. I think it might work to
define a three-way context value instead of a boolean:

SAFE_IF_NULL a rel is nonnullable if nulling it forces this
expression to return NULL
SAFE_IF_NULL_OR_FALSE a rel is nonnullable if nulling it forces this
expression to return NULL or FALSE
SAFE_IF_NULL_OR_TRUE a rel is nonnullable if nulling it forces this
expression to return NULL or TRUE

The present notion of top-level would be replaced by context
SAFE_IF_NULL_OR_FALSE. Descent through a strict operator or function
always narrows the context to SAFE_IF_NULL (since we don't know what
the op/func might do with other values, and we can't even be sure we're
talking about booleans anymore anyway). Descent through NOT exchanges
SAFE_IF_NULL_OR_FALSE and SAFE_IF_NULL_OR_TRUE. You could make
appropriate extensions of the code for NullTest and BooleanTest, and
I think this would allow sane handling of the AND and OR cases: for
example, AND with SAFE_IF_NULL_OR_FALSE context really can union the
inputs' nonnullable sets (since forcing any one input to null or false
guarantees the AND result is null or false) but AND in the other
contexts has to intersect the inputs' sets. I think also that AND and
OR might be able to pass down more liberal contexts than they get,
but am not sure about that part.

(BTW, I don't quite like the adjective SAFE here; can you think of a
better term?)

The SAFE_IF_NULL_OR_TRUE context might prove not to be worth the trouble
of implementing --- I'm not sure if either AND or OR can really exploit
it any better than they can exploit SAFE_IS_NULL. If not, we'd be back
to just a boolean context value, but we'd have a clearer understanding
of what it really means.

I don't have time to slog through the details, but if you want to run
with that, go for it ...

regards, tom lane

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Bradley Baetz 2003-05-04 01:14:58 Re: apply outer->inner join optimisation to OR clauses
Previous Message Bradley Baetz 2003-05-04 00:15:54 Re: apply outer->inner join optimisation to OR clauses