Re: Strange query behaviour

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Isaac Morland <isaac(dot)morland(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Strange query behaviour
Date: 2019-01-22 20:32:22
Message-ID: 87va2g30wc.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Isaac" == Isaac Morland <isaac(dot)morland(at)gmail(dot)com> writes:

Isaac> So it is as if checking the whole tuple for NULL requires
Isaac> reading the PDF bytea columns, but checking just the primary key
Isaac> for NULL or even reading the lengths of the PDFs does not.

That is almost certainly exactly what happens. If the PDF columns are of
type bytea, or if they are of type text and the database encoding is
single-byte, then length() does not need to detoast the column in order
to get the size (the byte size of the toasted datum is stored in the
toast pointer).

However, constructing a whole-row datum does require detoasting any
externally-stored columns (this used not to be the case, but that caused
a lot of bugs).

Isaac> For the moment I'm going to fix it by just using
Isaac> "y.primary_key_column IS NULL" instead of "y IS NULL" where I
Isaac> want to check whether I have a row from y corresponding to a
Isaac> given row in x.

What you should actually use in these cases for your IS NULL check is
one of the columns of the join condition. That allows the planner to
detect that the query is in fact an anti-join, and optimize accordingly.

The other, and IMO better, way to write anti-join queries is to use an
explicit NOT EXISTS. (Note, do _not_ use NOT IN, since that has its own
issues with NULL handling.)

Isaac> 1) when checking an entire row for null,

This isn't a very common operation and the SQL-standard semantics for it
are actually quite weird (for example, x IS NULL is not the opposite
condition to x IS NOT NULL). So I don't think we need to encourage it.

Isaac> start with a primary key field or other NOT NULL field. In the
Isaac> common case of checking what happened with a left join, this is
Isaac> all that needs to be done - either there is a row, in which case
Isaac> the field cannot be NULL, or there is no row and all the other
Isaac> fields must also be NULL.

The planner can do even better than this if you apply the IS NULL test
_specifically to one of the join columns_. When a join condition is
strict (which it almost always is), then testing the column from the
nullable side is provably (to the planner) equivalent to testing the
existence of the matching row, which allows it to transform the join
from an outer join to an anti-join.

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-01-22 20:35:27 Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD
Previous Message Thomas Munro 2019-01-22 20:31:53 Re: Refactoring the checkpointer's fsync request queue