Re: Strange query behaviour

From: Isaac Morland <isaac(dot)morland(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Strange query behaviour
Date: 2019-01-22 21:47:10
Message-ID: CAMsGm5d=y+M9mWYn=SWSsoXELCiK0wBEVFMxAjr5RT4MZGF85A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 22 Jan 2019 at 15:32, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
wrote:

> >>>>> "Isaac" == Isaac Morland <isaac(dot)morland(at)gmail(dot)com> writes:
>
> Isaac> So it is as if checking the whole tuple for NULL requires
> Isaac> reading the PDF bytea columns, but checking just the primary key
> Isaac> for NULL or even reading the lengths of the PDFs does not.
>
> That is almost certainly exactly what happens. If the PDF columns are of
> type bytea, or if they are of type text and the database encoding is
> single-byte, then length() does not need to detoast the column in order
> to get the size (the byte size of the toasted datum is stored in the
> toast pointer).
>

Thanks very much for the detailed response! I just tested and indeed query
performance has gone back to something like what I would expect. I feel
more confident, however, with your confirmation and elaboration on the
underlying details.

What you should actually use in these cases for your IS NULL check is
> one of the columns of the join condition. That allows the planner to
> detect that the query is in fact an anti-join, and optimize accordingly.
>
> The other, and IMO better, way to write anti-join queries is to use an
> explicit NOT EXISTS. (Note, do _not_ use NOT IN, since that has its own
> issues with NULL handling.)
>

Thanks for the hint. As it happens, in the actual situation, I had a view
which is defined something like:

CREATE ... AS
SELECT ..., NOT y IS NULL AS has_y
FROM x LEFT JOIN y USING (primary_key_field);

I used the simpler example when I found it would exhibit the same symptoms.
However, using a join key field as you suggest still seems to be working to
fix my problem.

Isaac> 1) when checking an entire row for null,
>
> This isn't a very common operation and the SQL-standard semantics for it
> are actually quite weird (for example, x IS NULL is not the opposite
> condition to x IS NOT NULL). So I don't think we need to encourage it.
>

In my use case, I think "y IS NULL" is better code than
"y.primary_key_field IS NULL". In the second snippet, it raises the
question, "why primary_key_field?" The answer is, "because otherwise the
query planner will get confused". The first snippet is what I really mean,
and also happens to be shorter.

This does require people reading the code to understand that "IS NULL" and
"IS NOT NULL" are not logical negations of each other.

> The planner can do even better than this if you apply the IS NULL test
> _specifically to one of the join columns_. When a join condition is
> strict (which it almost always is), then testing the column from the
> nullable side is provably (to the planner) equivalent to testing the
> existence of the matching row, which allows it to transform the join
> from an outer join to an anti-join.
>

What is confusing me is why the planner can't convert "[entire row] IS
NULL" into a test for existence of the matching row (assuming there is at
least one NOT NULL column).

OK, let's put it this way: if I manage to dig into the planner internals
enough to figure out how to make the planner understand this, and write a
decent patch, would I have a good chance of getting it accepted? From here
the figuring out part seems like a long-shot: messing with planner code
scares me a little and I already have a feature request that I want to work
on and a reasonable workaround for this one, but I'd like an assessment
nevertheless. Maybe I'll have more energy this year!

Thanks again for taking the time to provide a detailed response. I very
much appreciate it.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-01-22 22:22:48 Re: Rare SSL failures on eelpout
Previous Message Alvaro Herrera 2019-01-22 21:45:48 Re: partitioned tables referenced by FKs