Re: Strange query behaviour

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Isaac Morland <isaac(dot)morland(at)gmail(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Strange query behaviour
Date: 2019-01-23 05:17:44
Message-ID: 1817.1548220664@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Isaac Morland <isaac(dot)morland(at)gmail(dot)com> writes:
> What is confusing me is why the planner can't convert "[entire row] IS
> NULL" into a test for existence of the matching row (assuming there is at
> least one NOT NULL column).

The reasons why the planner does very little with row-level IS [NOT] NULL
conditions are (1) so few people use them that it doesn't really seem
worth expending cycles or development effort on such cases, and (2) the
SQL spec is vague enough about the semantics of these predicates that
we've never been entirely sure whether we implement them correctly. Thus
it didn't seem worth expending a lot of effort developing deduction logic
that might turn out to be completely wrong.

I suspect (1) is not unrelated to (2) ...

The semantic vaguenesses are also twofold:

(a) It's not quite clear whether the spec intends to draw a distinction
between a composite value that is in itself NULL and one that is a tuple
of all NULL fields. There is certainly a physical difference, but it
looks like the IS [NOT] NULL predicates are designed not to be able to
tell the difference.

(b) It's not at all clear whether these predicates are meant to be
recursive for nested composite types. Depending on how you read it,
it could be that a composite field that is NULL satisfies an IS NULL
predicate on the parent row, but a composite field that is ROW(NULL,
NULL, ...) does not. That is in fact how we implement it, but it
sure seems weird given (a).

So personally, I've got zero confidence in these predicates and don't
especially wish to sink development effort into something that
critically depends on having the right semantics for them. The shortage
of field demand for doing better doesn't help.

Circling back to your interest in using a "row IS NULL" predicate to
conclude that a left join is actually an antijoin, these questions are
really critical, because if the join column is itself composite, the
deduction would hold *only* if our theory that "row IS NULL" is
non-recursive is correct. If our theory is wrong, and the spec
intends that ROW(NULL, ROW(NULL, NULL), NULL) IS NULL should be TRUE,
then it wouldn't be correct to draw the inference that the join has
to be an antijoin. And that is also connected to the fact that
record_cmp considers ROW(NULL, NULL) to be equal to ROW(NULL, NULL),
which is (or at least seems to be) wrong per spec, but tough: we have
to have a total order for composite values, or they wouldn't be
indexable or sortable.

If your head's not hurting yet, you just need to think harder
about these issues. It's all a mess, and IMO we're best off
not adding any more dependencies on these semantics than we
have to.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Moon, Insung 2019-01-23 05:18:49 Typo: pgbench.c
Previous Message Alexander Korotkov 2019-01-23 05:02:26 Re: jsonpath